CVPR

2023-04-03 03:53| 来源: 网络整理| 查看: 265

CVPR-2022-Papers

5533b620402406dba74eb9a452e32d4

官网链接：https://cvpr2022.thecvf.com/

开会时间：2022年6月19日-6月24日

❣❣❣近日，CVPR 2022 接收论文公布！总计2067篇！，全部论文已发布，多多关注!! ❣❣❣另外打包下载所有论文，可在【我爱计算机视觉】微信公众号后台回复“paper”。历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~ 2022 年论文分类汇总戳这里

↘️CVPR-2022-Papers ↘️WACV-2022-Papers

2021年论文分类汇总戳这里

↘️ICCV-2021-Papers ↘️CVPR-2021-Papers

2020 年论文分类汇总戳这里

↘️CVPR-2020-Papers ↘️ECCV-2020-Papers

目录 🐱 🐶 🐯 🐺 1.其它 2.Image Segmentation(图像分割) 3.Image Progress(图像处理) 4.Image Captioning(图像字幕) 5.Object Detection(目标检测) 6.Object Tracking(目标跟踪) 7.Point Cloud(点云) 8.Action Detection(人体动作检测与识别) 9.Human Pose Estimation(人体姿态估计) 10.3D(三维视觉) 11.Face 12.Image-to-Image Translation(图像到图像翻译) 13.GAN 14.Video 15.Transformer 16.Semi/self-supervised learning(半/自监督) 17.Medical Image(医学影像) 18.Person Re-Identification(人员重识别) 19.Neural Architecture Search(神经架构搜索) 20.Autonomous vehicles(自动驾驶) 21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像) 22.Image Synthesis/Generation(图像合成) 23.Image Retrieval(图像检索) 24.Super-Resolution(超分辨率) 25.Fine-Grained/Image Classification(细粒度/图像分类) 26.GCN/GNN 27.Pose Estimation(物体姿势估计) 28.Style Transfer(风格迁移) 29.Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人) 30.Visual Answer Questions(视觉问答) 31.Vision-Language(视觉语言) 32.Data Augmentation(数据增强) 33.Human-Object Interaction(人物交互) 34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝) 35.OCR 36.Optical Flow(光流估计) 37.Contrastive Learning(对比学习) 38.Meta-Learning(元学习) 39.Continual Learning(持续学习) 40.Adversarial Learning(对抗学习) 41.Incremental Learning(增量学习) 42.Metric Learning(度量学习) 43.Multi-Task Learning(多任务学习) 44.Federated Learning(联邦学习) 45.Dense Prediction(密集预测) 46.Scene Graph Generation(场景图生成) 47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应) 48.Visual Grounding 49.Image Geo-localization(图像地理定位) 50.Anomaly Detection(异常检测) 51.光学、几何、光场成像 52.Human Motion Forecasting(人体运动预测) 53.Sign Language Translation(手语翻译) 54.Dataset(数据集) 55.Novel View Synthesis(视图合成) 56.Sound 57.Gaze Estimation(视线估计) 58.Neural rendering(神经渲染) 59.动画 60.Visual Emotion Analysis(视觉情感分析)

聚类

DeepDPM: Deep Clustering With an Unknown Number of Clusters⭐code

场景流

Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation⭐code

图识别

Improving Subgraph Recognition With Variational Graph Information Bottleneck⭐code

运动模糊

Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos

人像眼镜和阴影消除

Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data⭐code

识别唇语

Sub-Word Level Lip Reading With Visual Attention

模拟时钟读数

It's About Time: Analog Clock Reading in the Wild⭐code🏠project

指纹识别

Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations😮oral

基于草图的图像操作

SketchEdit: Mask-Free Local Image Manipulation with Partial Sketches⭐code🏠project

草图识别

Finding Badly Drawn Bunnies

去偏移

Debiased Learning From Naturally Imbalanced Pseudo-Labels⭐code

线段分类

Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World

Interactive object understanding

Human Hands as Probes for Interactive Object Understanding⭐code🏠project

数字人类

GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping⭐code🏠project

强化学习

DECORE: Deep Compression With Reinforcement Learning

视觉关系检测

A Probabilistic Graphical Model Based on Neural-symbolic Reasoning for Visual Relationship Detection

裂缝识别

Geometry-Aware Guided Loss for Deep Crack Recognition

眼球认证

EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images

视听事件定位

Cross-Modal Background Suppression for Audio-Visual Event Localization⭐code

无偏见学习

A Conservative Approach for Unbiased Learning on Unknown Biases⭐code

Object Proposal Generation

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

读唇术

Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading⭐code🏠project

对应学习

MS2DG-Net: Progressive Correspondence Learning via Multiple Sparse Semantics Dynamic Graph⭐code

视觉定位

Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation⭐code🏠project

视觉识别

Causal Transportability for Visual Recognition⭐code A Simple Episodic Linear Probe Improves Visual Recognition in the Wild Contextual Debiasing for Visual Recognition With Causal Mechanisms

Long-term action quality assessment

Likert Scoring With Grade Decoupling for Long-Term Action Assessment

运动识别

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition⭐code

CNN

An Image Patch Is a Wave: Phase-Aware Vision MLP⭐code

Volume Rendering

DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering⭐code

virtual correspondences

Virtual Correspondence: Humans as a Cue for Extreme-View Geometry🏠project

红外测量

Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements

4D场景捕捉

HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR⭐code🏠project

可变形头像

I M Avatar: Implicit Morphable Head Avatars From Videos⭐code🏠project

活动预测

A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-Shot Representation Forecasting

Mirror Detection

Learning Semantic Associations for Mirror Detection⭐code

双手重建

Interacting Attention Graph for Single Image Two-Hand Reconstruction⭐code

Image Vectorization

Towards Layer-wise Image Vectorization⭐code

行动学习

Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency⭐code

BNN

PokeBNN: A Binary Pursuit of Lightweight Accuracy⭐code

CNN

Condensing CNNs With Partial Differential Equations⭐code

Place Recognition

TransVPR: Transformer-based place recognition with multi-level attention aggregation😮oral

物体识别

AirObject: A Temporally Evolving Graph Embedding for Object Identification⭐code

边缘检测

EDTER: Edge Detection with Transformer⭐code

缺陷检测

Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning Open-Set Recognition(开集识别) Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition⭐code Active Learning(主动学习) Active Learning for Open-Set Annotation Active Learning by Feature Mixing Towards Robust and Reproducible Active Learning Using Neural Networks⭐code Backdoor Attacks(后门攻击) DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints Better Trigger Inversion Optimization in Backdoor Scanning Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks😮oral⭐code Multi-view Clustering(多视图聚类) Highly-efficient Incomplete Large-scale Multi-view Clustering with Consensus Bipartite Graph⭐code Multi-Level Feature Learning for Contrastive Multi-View Clustering⭐code Deep Safe Multi-View Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase MPC: Multi-View Probabilistic Clustering Machine Translation(机器翻译) VALHALLA: Visual Hallucination for Machine Translation🏠project Object Counting(目标计数) Rethinking Spatial Invariance of Convolutional Networks for Object Counting⭐code📰解读 Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting⭐code computer-aided design (CAD) Neural Face Identification in a 2D Wireframe Projection of a Manifold Object⭐code JoinABLe: Learning Bottom-up Assembly of Parametric CAD Joints⭐code ROCA: Robust CAD Model Retrieval and Alignment from a Single Image⭐code CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings⭐code GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings Transfer Learning(迁移学习) Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning Graph Matching(图匹配) Graph-Context Attention Networks for Size-Varied Deep Graph Matching⭐code Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond⭐code Noise Modeling(图像噪声建模) Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images🏠project 60.Visual Emotion Analysis(视觉情感分析) MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis 59.动画 APES: Articulated Part Extraction From Sprite Sheets🏠project BANMo: Building Animatable 3D Neural Models From Many Casual Videos😮oral🏠project Neural Head Avatars From Monocular RGB Videos⭐code🏠project FLAG: Flow-Based 3D Avatar Generation From Sparse Observations🏠project 图像动画 Thin-Plate Spline Motion Model for Image Animation⭐code 人物动画 Structured Local Radiance Fields for Human Avatar Modeling 3D character animation(三维角色动画) 皮肤预测 SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters🏠project 3D 舞蹈生成 Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory⭐code A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres 静止图像到动画 Controllable Animation of Fluid Elements in Still Images🏠project 3D human avatars gDNA: Towards Generative Detailed Neural Avatars⭐code🏠project 58.Neural rendering(神经渲染) Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images😮oral🏠project SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction⭐code Modeling Indirect Illumination for Inverse Rendering⭐code🏠project GenDR: A Generalized Differentiable Renderer⭐code泛化可微渲染器 CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields⭐code🏠project NeRF-Editing: Geometry Editing of Neural Radiance Fields AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance Fields🏠project Neural Rays for Occlusion-Aware Image-Based Rendering⭐code🏠project EfficientNeRF Efficient Neural Radiance Fields⭐code CoNeRF: Controllable Neural Radiance Fields⭐code🏠project Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields🏠project Hallucinated Neural Radiance Fields in the Wild⭐code🏠project HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video😮oral⭐code🏠project📺video Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields Deblur-NeRF: Neural Radiance Fields From Blurry Images⭐code🏠project NeRFReN: Neural Radiance Fields With Reflections🏠project Depth-Supervised NeRF: Fewer Views and Faster Training for Free⭐code🏠project Dense Depth Priors for Neural Radiance Fields From Sparse Input Views⭐code🏠project📺video Light Field Neural Rendering⭐code🏠project InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering⭐code🏠project BokehMe: When Neural Rendering Meets Classical Rendering😮oral⭐code Plenoxels: Radiance Fields Without Neural Networks⭐code🏠project HDR-NeRF: High Dynamic Range Neural Radiance Fields Urban Radiance Fields🏠project Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations⭐code Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time⭐code🏠project Point-NeRF: Point-Based Neural Radiance Fields HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs🏠project Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation 57.Gaze Estimation(视线估计) GazeOnce: Real-Time Multi-Person Gaze Estimation Contrastive Regression for Domain Adaptation on Gaze Estimation Generalizing Gaze Estimation With Rotation Consistency GaTector: A Unified Framework for Gaze Object Prediction Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination🏠project 56.Sound Finding Fallen Objects via Asynchronous Audio-Visual Integration🏠project Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound⭐code🏠project Visual Acoustic Matching😮oral🏠project 声源定位 Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes⭐code Mix and Localize: Localizing Sound Sources in Mixtures A Proposal-Based Paradigm for Self-Supervised Sound Source Localization in Videos 音频配对 It's Time for Artistic Correspondence in Music and Video🏠project 语音克隆 V2C: Visual Voice Cloning⭐code 视听语音增强 Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis📺video 文本转语音 More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech⭐code 语音转人脸图像 Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?⭐code🏠project 语音分离 Reading To Listen at the Cocktail Party: Multi-Modal Speech Separation🏠project 语音手势生成 Low-Resource Adaptation for Personalized Co-Speech Gesture Generation🏠project 扬声器定位 Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization 语音手势生成 SEEG: Semantic Energized Co-Speech Gesture Generation⭐code 55.Novel View Synthesis(视图合成) NPBG++: Accelerating Neural Point-Based Graphics🏠project Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations🏠project AutoRF: Learning 3D Object Radiance Fields from Single View Observations🏠project NeurMiPs: Neural Mixture of Planar Experts for View Synthesis⭐code🏠project📺video📰解读 GeoNeRF: Generalizing NeRF with Geometry Priors⭐code🏠project📺video FWD: R eal-Time Novel View Synthesis With Forward Warping and Depth⭐code Block-NeRF: Scalable Large Scene Neural View Synthesis Boosting View Synthesis With Residual Transfer⭐code🏠project NeRF in the Dark: High Dynamic Range View Synthesis From Noisy Raw Images RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs😮oral⭐code🏠project📺video 视图连接 Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association⭐code 54.Dataset(数据集) ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer⭐code🏠project📰粗解 Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities⭐code🏠project 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos🌻dataset Hephaestus: A large scale multitask dataset towards InSAR understanding SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis🌻dataset AKB-48: A Real-World Articulated Object Knowledge Base⭐code📰粗解 Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes⭐code🏠project ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection一个Amodel实例分割网络和一个用于X射线废物检查的真实数据集 MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions🌻dataset一个可扩展的数据集，用于从电影音频描述中获得视频的Language Grounding DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation🌻dataset具有受控形状和材料变化的光度测量立体基准数据集 DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From a Single Image🌻dataset一个大规模的密集、准确和多样化的数据集，用于从单一图像中进行三维头部对准 Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task🌻dataset用于自主驾驶和单眼3D物体检测任务的路边感知数据集 Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions🌻dataset Open Challenges in Deep Stereo: The Booster Dataset🌻dataset RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation🏠project 卫星数据集 DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation 动物行为理解数据集 Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding😮oral🏠project🌻dataset 数据集(森林监测) The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift🌻dataset 3D目标理解 ABO: Dataset and Benchmarks for Real-World 3D Object Understanding🌻dataset 数据集(AutoMine) AutoMine: An Unmanned Mine Dataset🌻dataset 数据集(人脸表情识别) FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos⭐code 数据集(手势识别) LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition⭐code 数据集(谷物识别) GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains🌻dataset 数据集(用于空间-时间行动、社会团体和活动检测) JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection)🌻dataset 53.Sign Language Translation(手语翻译) A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production MLSLT: Towards Multilingual Sign Language Translation🏠project 手语识别 C2SLR: Consistency-Enhanced Continuous Sign Language Recognition 52.Human Motion Forecasting(人体运动预测) Motron: Multimodal Probabilistic Human Motion Forecasting⭐code Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction⭐code Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction MotionAug: Augmentation With Physical Correction for Human Motion Prediction⭐code Future Transformer for Long-term Action Anticipation⭐code🏠project Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction⭐code Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement⭐code Multi-Person Extreme Motion Prediction 51.光学、几何、光场成像 Compressive Single-Photon 3D Cameras Fisher Information Guidance for Learned Time-of-Flight Imaging Light Field(光场) Occlusion-Aware Cost Constructor for Light Field Depth Estimation⭐code📰粗解 Neural Point Light Fields⭐code🏠project Acquiring a Dynamic Light Field Through a Single-Shot Coded Image Learning Neural Light Fields With Ray-Space Embedding⭐code🏠project 深度重建 Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection⭐code🏠project📺video 快门校正 Learning Adaptive Warping for Real-World Rolling Shutter Correction⭐code 热红外成像 Infrared Invisible Clothing:Hiding from Infrared Detectors at Multiple Angles in Real World😮oral 相机姿势估计 DiffPoseNet: Direct Differentiable Camera Pose Estimation 相机重定位 SceneSqueezer: Learning to Compress Scene for Camera Relocalization😮oral 成像 Adaptive Gating for Single-Photon 3D Imaging All-photon Polarimetric Time-of-Flight Imaging Computing Wasserstein-p Distance Between Images With Linear Cost⭐code 光学 Quantization-aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging⭐code Dual-Shutter Optical Vibration Sensing 相机姿势 Camera Pose Estimation Using Implicit Distortion Models 相机成像 Learning to Zoom Inside Camera Imaging Pipeline 相机定位 Learning To Detect Scene Landmarks for Camera Localization⭐code 孔径成像 Synthetic Aperture Imaging With Events and Frames⭐code 高光谱成像 Real-Time Hyperspectral Imaging in Hardware via Trained Metasurface Encoders⭐code 50.Anomaly Detection(异常检测) Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection⭐code Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection⭐code Anomaly Detection via Reverse Distillation From One-Class Embedding Towards Total Recall in Industrial Anomaly Detection⭐code 离群点检测 Robust outlier detection by de-biasing VAE likelihoods 49.Image Geo-localization(图像地理定位) TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization⭐code 视觉地理定位 Rethinking Visual Geo-localization for Large-Scale Applications⭐code Deep Visual Geo-localization Benchmark😮oral🏠project 轨迹重建 MonoTrack: Shuttle trajectory reconstruction from monocular badminton video 48.Visual Grounding Multi-View Transformer for 3D Visual Grounding⭐code Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning⭐code视觉定位，通过自然语言定位目标位置（很有意思的研究） Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding⭐code Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding⭐code Multi-Modal Dynamic Graph Transformer for Visual Grounding⭐code 47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应) 小样本 Ranking Distance Calibration for Cross-Domain Few-Shot Learning Few-shot Learning with Noisy Labels Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference🏠project📺video Few-shot Backdoor Defense Using Shapley Estimation📰解读 Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning⭐code EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning⭐code Semi-Supervised Few-Shot Learning via Multi-Factor Clustering⭐code Cross-Domain Few-Shot Learning With Task-Specific Adapters⭐code 零样本 MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning⭐code📰粗解 Unseen Classes at a Later Time? No Problem⭐code En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning📰解读 Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning⭐code KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning⭐code📰解读 Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks Distinguishing Unseen From Seen for Generalized Zero-Shot Learning VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning⭐code📰零样本学习，大幅减少人工标注！马普所和北邮提出富含视觉信息的类别语义嵌入 Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language⭐code 域泛化 Compound Domain Generalization via Meta-Knowledge Encoding Causality Inspired Representation Learning for Domain Generalization⭐code Towards Unsupervised Domain Generalization📰CVPR 2022丨清华大学提出：无监督域泛化 (UDG)本次任务的主要目标是域泛化（domain generalization(DG)），是首篇将DG推广到unsupervised learning 领域的，并提出一个新的研究领域 unsupervised domain generalization(UDG)。 Towards Principled Disentanglement for Domain Generalization😮oral⭐code Meta Convolutional Neural Networks for Single Domain Generalization PCL: Proxy-Based Contrastive Learning for Domain Generalization Localized Adversarial Domain Generalization Unsupervised Domain Generalization by Learning a Bridge Across Domains Style Neophile: Constantly Seeking Novel Styles for Domain Generalization BoosterNet: Improving Domain Generalization of Deep Neural Nets Using Culpability-Ranked Features Failure Modes of Domain Generalization Algorithms Geometric and Textural Augmentation for Domain Gap Reduction⭐code Revisiting Domain Generalized Stereo Matching Networks From a Feature Consistency Perspective⭐code 域外泛化 The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization 域适应 Continual Test-Time Domain Adaptation⭐code Safe Self-Refinement for Transformer-based Domain Adaptation⭐code📰解读 Source-Free Domain Adaptation via Distribution Estimation📰解读 Learning Distinctive Margin toward Active Domain Adaptation⭐code📰解读 DINE: Domain Adaptation from Single and Multiple Black-box Predictors⭐code Exploring Domain-Invariant Parameters for Source Free Domain Adaptation Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal⭐code No-Reference Point Cloud Quality Assessment via Domain Adaptation⭐code Slimmable Domain Adaptation⭐code SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation 无监督域适应 Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation⭐code Category Contrast for Unsupervised Domain Adaptation in Visual Tasks The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization⭐code Spectral Unsupervised Domain Adaptation for Visual Recognition 46.Scene Graph Generation(场景图生成) PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation Fine-Grained Predicates Learning for Scene Graph Generation⭐code HL-Net: Heterophily Learning Network for Scene Graph Generatio⭐code场景图生成：异质学习网络📰解读 RU-Net: Regularized Unrolling Network for Scene Graph Generation⭐code场景图生成：正则展开网络📰解读 The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation⭐code Dynamic Scene Graph Generation via Anticipatory Pre-Training Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation⭐code Structured Sparse R-CNN for Direct Scene Graph Generation⭐code HL-Net: Heterophily Learning Network for Scene Graph Generation⭐code Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation SGTR: End-to-end Scene Graph Generation with Transformer⭐code 视频场景图生成 Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs⭐code 45.Dense Prediction(密集预测) Does Robustness on ImageNet Transfer to Downstream Tasks? MPViT: Multi-Path Vision Transformer for Dense Prediction⭐code Learning Multiple Dense Prediction Tasks From Partially Annotated Data⭐code 44.Federated Learning(联邦学习) CD2-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage⭐code FedCorr: Multi-Stage Federated Learning for Label Noise Correction⭐code Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning Layer-Wised Model Aggregation for Personalized Federated Learning Federated Learning With Position-Aware Neurons Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning⭐code FedDC: Federated Learning With Non-IID Data via Local Drift Decoupling and Correction⭐code Learn From Others and Be Yourself in Heterogeneous Federated Learning⭐code FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning Robust Federated Learning With Noisy and Heterogeneous Clients⭐code ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning⭐code 43.Multi-Task Learning(多任务学习) Controllable Dynamic Multi-Task Architectures🏠project Task Adaptive Parameter Sharing for Multi-Task Learning Raw High-Definition Radar for Multi-Task Learning⭐code 42.Metric Learning(度量学习) Self-Taught Metric Learning without Labels⭐code🏠project Enhancing Adversarial Robustness for Deep Metric Learning Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning⭐code Non-Isotropy Regularization for Proxy-Based Deep Metric Learning⭐code Hyperbolic Vision Transformers: Combining Improvements in Metric Learning⭐code Enhancing Adversarial Robustness for Deep Metric Learning Weakly-Supervised Metric Learning With Cross-Module Communications for the Classification of Anterior Chamber Angle Images⭐code Integrating Language Guidance Into Vision-Based Deep Metric Learning⭐code 41.Incremental Learning(增量学习) 增量学习 Energy-based Latent Aligner for Incremental Learning⭐code General Incremental Learning with Domain-aware Categorical Representations Forward Compatible Few-Shot Class-Incremental Learning⭐code Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning⭐code Few-Shot Incremental Learning for Label-to-Image Translation 类增量学习 Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches Constrained Few-shot Class-incremental Learning⭐code Class-Incremental Learning with Strong Pre-trained Models Class-Incremental Learning by Knowledge Distillation With Adaptive Feature Consolidation⭐code Bring Evanescent Representations to Life in Lifelong Class Incremental Learning Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning Federated Class-Incremental Learning⭐code vCLIMB: A Novel Video Class Incremental Learning Benchmark😮oral⭐code🏠project 40.Adversarial Learning(对抗学习) Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network Towards Practical Certifiable Patch Defense with Vision Transformer📰解读 Enhancing Adversarial Training with Second-Order Statistics of Weights⭐code Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack⭐code Improving Adversarial Transferability via Neuron Attribution-Based Attacks⭐code Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart⭐code Bounded Adversarial Attack on Deep Content Features Subspace Adversarial Training⭐code Cross-Modal Transferable Adversarial Attacks From Images to Videos⭐code Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training⭐code Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free⭐code Robust Combination of Distributed Gradients Under Adversarial Perturbations Adversarial Texture for Fooling Person Detectors in the Physical World DTA: Physical Camouflage Attacks Using Differentiable Transformation Network🏠project BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning⭐code Pyramid Adversarial Training Improves ViT Performance🏠project NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning 对抗样本 Label-Only Model Inversion Attacks via Boundary Repulsion⭐code Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection⭐code Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input⭐code Leveraging Adversarial Examples To Quantify Membership Information Leakage⭐code 对抗攻击 Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon⭐code Transferable Sparse Adversarial Attack⭐code Towards Efficient Data Free Black-Box Adversarial Attack Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity⭐code Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability⭐code 黑盒 Investigating Top-k White-Box and Transferable Black-box Attack⭐code DST: Dynamic Substitute Training for Data-free Black-box Attack🏠project Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees😮oral⭐code Adversarial Eigen Attack on Black-Box Models Exploring Effective Data for Surrogate Training Towards Black-Box Attack⭐code Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution⭐code 对抗训练 LAS-AT: Adversarial Training with Learnable Attack Strategy😮oral⭐code📰CVPR 2022 中科院、腾讯提出LAS-AT，利用“可学习攻击策略”进行“对抗训练” 39.Continual Learning(持续学习) On Generalizing Beyond Domains in Cross-Domain Continual Learning Probing Representation Forgetting in Supervised and Unsupervised Continual Learning⭐code Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries⭐code Learning To Prompt for Continual Learning⭐code Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency⭐code Continual Learning for Visual Search With Backward Consistent Feature Embedding⭐code Meta-Attention for ViT-Backed Continual Learning⭐code Continual Learning with Lifelong Vision Transformer📰解读 DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion⭐code GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning🏠project 38.Meta-Learning(元学习) What Matters For Meta-Learning Vision Regression Tasks?⭐code Multidimensional Belief Quantification for Label-Efficient Meta-Learning Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning⭐code Learning to Learn and Remember Super Long Multi-Domain Task Sequence😮oral⭐code📰解读 37.Contrastive Learning(对比学习) Selective-Supervised Contrastive Learning with Noisy Labels⭐code📰粗解 Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning⭐code Cam-Ready: UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning⭐code Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework⭐code Crafting Better Contrastive Views for Siamese Representation Learning😮oral⭐code Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo⭐code Estimating Fine-Grained Noise Model via Contrastive Learning Contextual Outpainting With Object-Level Contrastive Learning🏠project Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning On Learning Contrastive Representations for Learning With Noisy Labels Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity Robust Contrastive Learning Against Noisy Views⭐code Unified Contrastive Learning in Image-Text-Label Space⭐code Consistent Explanations by Contrastive Learning⭐code Rethinking Minimal Sufficient Representation in Contrastive Learning⭐code Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining🌻dataset Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization⭐code Unpaired Deep Image Deraining Using Dual Contrastive Learning⭐code🏠project 36.Optical Flow(光流估计) CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow⭐code DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow⭐code Imposing Consistency for Optical Flow Estimation Deep Equilibrium Optical Flow Estimation⭐code📰解读 GMFlow: Learning Optical Flow via Global Matching😮oral⭐code📰解读 Optical Flow Estimation for Spiking Camera⭐code Learning Optical Flow with Kernel Patch Attention⭐code📰解读 CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation⭐code Global Matching With Overlapping Attention for Optical Flow Estimation⭐code Towards Understanding Adversarial Robustness of Optical Flow Networks⭐code 35.OCR XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding SwinTextSpotter: Scene Text Spotting via Better Synergy Between Text Detection and Text Recognition⭐code 场景文本检测 Towards End-to-End Unified Scene Text Detection and Layout Analysis⭐code Pushing the Performance Limit of Scene Text Recognizer without Human Annotation Vision-Language Pre-Training for Boosting Scene Text Detectors⭐code视觉语言预训练，场景文本检测,代码将开源，地址尚未公布。 Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection 场景文本识别 SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization⭐code Text Spotting Text Spotting Transformers⭐code📰粗解 Towards Weakly-Supervised Text Spotting Using a Multi-Task Transformer LOGO设计 Aesthetic Text Logo Synthesis via Content-aware Layout Inferring⭐code📰CVPR 2022 | 北大、腾讯提出文字logo生成模型，脑洞大开堪比设计师字体生成 XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation (Oral)Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator字体生成（很有商业价值的方向） Few-Shot Font Generation by Learning Fine-Grained Local Styles 文本识别 Open-set Text Recognition via Character-Context Decoupling 表格结构识别 Neural Collaborative Graph Machines for Table Structure Recognition📰解读文本美观预测评估 Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method⭐code 表结构理解 TableFormer: Table Structure Understanding with Transformers 文本分割 BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild 表格检测 PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents⭐code 文本修复 Fourier Document Restoration for Robust Document Dewarping and Recognition🏠project 手写数学表达式识别 Syntax-Aware Network for Handwritten Mathematical Expression Recognition 34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝) 知识蒸馏 Knowledge Distillation with the Reused Teacher Classifier DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers📰解读 Decoupled Knowledge Distillation⭐code📰解耦知识蒸馏，让Hinton在7年前提出的方法重回SOTA行列 Knowledge Distillation via the Target-aware Transformer😮oral⭐code📰RMIT&阿里&UTS&中山提出Target-aware Transformer，进行one-to-all知识蒸馏！性能SOTA Evaluation-oriented Knowledge Distillation for Deep Face Recognition😮oral⭐code📰解读1📰解读2 Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation⭐code Self-Distillation From the Last Mini-Batch for Consistency Regularization⭐code Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability⭐code Knowledge Distillation: A Good Teacher Is Patient and Consistent PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models⭐code Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation 模型压缩 CHEX: CHannel EXploration for CNN Model Compression DiSparse: Disentangled Sparsification for Multitask Model Compression⭐code 剪枝 Revisiting Random Channel Pruning for Neural Network Compression⭐code📰解读 Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction When To Prune? A Policy Towards Early Structural Pruning Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs 量化 A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information⭐code🏠project Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing the Reconstruction Error Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation⭐code AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation⭐code Data-Free Network Compression via Parametric Non-Uniform Mixed Precision Quantization Mutual Quantization for Cross-Modal Search With Noisy Labels Instance-Aware Dynamic Neural Network Quantization⭐code IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization⭐code Learnable Lookup Table for Neural Network Quantization Channel Balancing for Accurate Quantization of Winograd Convolutions 超参数优化 AME: Attention and Memory Enhancement in Hyper-Parameter Optimization 33.Human-Object Interaction(人物交互) HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction⭐code MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection⭐code Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection⭐code OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction⭐code📰粗解 D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions🏠code Learning Transferable Human-Object Interaction Detector With Natural Language Supervision⭐code What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions😮oral Human-Object Interaction Detection via Disentangled Transformer Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection⭐code📰解读 Interactiveness Field in Human-Object Interactions⭐code Stability-driven Contact Reconstruction From Monocular Color Images⭐code单目彩色图像的手物交互重建，人机交互 Interactiveness Field of Human-Object Interactions⭐code📰粗解 Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection⭐code📰解读1📰解读2 Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions😮oral⭐code Efficient Two-Stage Detection of Human-Object Interactions With a Novel Unary-Pairwise Transformer🏠project NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions Category-Aware Transformer Network for Better Human-Object Interaction Detection HOI跟踪 BEHAVE: Dataset and Method for Tracking Human Object Interactions🏠project 32.Data Augmentation(数据增强) 🐦️AlignMix: Improving representation by interpolating aligned features 3D Common Corruptions and Data Augmentation⭐code🏠project📺video📰粗解 Kubric: A scalable dataset generator⭐code Robust Optimization As Data Augmentation for Large-Scale Graphs⭐code AIM: an Auto-Augmenter for Images and Meshes⭐code Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation🏠project TeachAugment: Data Augmentation Optimization Using Teacher Knowledge😮oral⭐code 31.Vision-Language(视觉语言) Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships⭐code VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers⭐code Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality🌻dataset Robust Cross-Modal Representation Learning with Progressive Self-Distillation Prompt Distribution Learning在下游的识别任务中，作者提出的方法在12个数据集上均展示出了一致性的性能提升。 Vision-Language Pre-Training with Triple Contrastive Learning⭐code Improving features Visual Grounding with Visual-Linguistic Veriﬁcation and Iterative Reasoning⭐code📰国科大&港中文提出带视觉语言验证和迭代推理的Visual Grounding框架，性能SOTA，代码已开源！ Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture⭐code🏠project VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks⭐code Lite-MDETR: A Lightweight Multi-Modal Detector Align and Prompt: Video-and-Language Pre-Training With Entity Prompts⭐code Unsupervised Vision-and-Language Pre-Training via Retrieval-Based Multi-Granular Alignment RegionCLIP: Region-based Language-Image Pretraining(https://github.com/microsoft/RegionCLIP) Grounded Language-Image Pre-Training⭐code Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions⭐code Conditional Prompt Learning for Vision-Language Models⭐code Multi-Modal Alignment Using Representation Codebook NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks😮oral⭐code An Empirical Study of Training End-to-End Vision-and-Language Transformers⭐code DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting⭐code FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback⭐code🏠project CLIP-Event: Connecting Text and Images With Event Structures⭐code Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model⭐code VLN EnvEdit: Environment Editing for Vision-and-Language Navigation⭐code Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation⭐code Reinforced Structured State-Evolution for Vision-Language Navigation⭐code📰解读 Cross-modal Map Learning for Vision and Language Navigation⭐code🏠project One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones What do navigation agents learn about their environment? Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation⭐code ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation⭐code 视频-文本表示学习 Video-Text Representation Learning via Differentiable Weak Temporal Alignment⭐code 视觉表征学习 Unsupervised Visual Representation Learning by Online Constrained K-Means When Does Contrastive Visual Representation Learning Work? 视觉导航 PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning⭐code🏠project 视觉描述 Weakly-Supervised Generation and Grounding of Visual Descriptions With Conditional Generative Models 30.Visual Answer Questions(视觉问答) VQA SimVQA: Exploring Simulated Environments for Visual Question Answering🏠project SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering⭐code📰粗解 V-Doc: Visual Questions Answers With Documents⭐code Grounding Answers for Visual Questions Asked by Visually Impaired People🏠project Query and Attention Augmentation for Knowledge-Based Explainable Reasoning⭐code MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering⭐code Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering LaTr: Layout-Aware Transformer for Scene-Text VQA WebQA: Multihop and Multimodal QA AVQA Learning to Answer Questions in Dynamic Audio-Visual Scenarios😮oral⭐code📰CVPR 2022 Oral | 人大高瓴AI学院提出面向动态视音场景的问答学习任务 Dual-Key Multimodal Backdoors for Visual Question Answering⭐code Maintaining Reasoning Consistency in Compositional Visual Question Answering⭐code From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering⭐code Video-QA Measuring Compositional Consistency for Video Question Answering Invariant Grounding for Video Question Answering😮oral⭐code📰解读 29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人) SLAM NICE-SLAM: Neural Implicit Scalable Encoding for SLAM⭐code🏠project📺video 目标导航 Online Learning of Reusable Abstract Models for Object Goal Navigation Is Mapping Necessary for Realistic PointGoal Navigation?⭐code🏠project try-on Dressing in the Wild by Watching Dance Videos🏠project Style-Based Global Appearance Flow for Virtual Try-On⭐code ClothFormer:Taming Video Virtual Try-on in All Module😮oral⭐code🏠project📰解读 Weakly Supervised High-Fidelity Clothing Model Generation Full-Range Virtual Try-On With Recurrent Tri-Level Transform🏠project ClothFormer: Taming Video Virtual Try-On in All Module😮oral⭐code📰解读 AR Episodic Memory Question Answering😮oral⭐codeAI助理：情景记忆问答（增强现实新任务，数据及代码均将开源）机器人 Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation via Discretisation 手-物姿态估计 ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis⭐code📰粗解机器人导航 Coupling Vision and Proprioception for Navigation of Legged Robots⭐code🏠project📺video 28.Style Transfer(风格迁移) Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer⭐code Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation⭐code Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization😮oral⭐code HEAT: Holistic Edge Attention Transformer for Structured Reconstruction⭐code StyTr2: Image Style Transfer With Transformers⭐code CLIPstyler: Image Style Transfer With a Single Text Condition⭐code 运动风格迁移 Style-ERD: Responsive and Coherent Online Motion Style Transfer 运动迁移 Structure-Aware Motion Transfer with Deformable Anchor Model⭐code📰解读场景风格化 StylizedNeRF: Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning 外观迁移 Splicing ViT Features for Semantic Appearance Transfer😮oral⭐code🏠project 风格化 Text2Mesh: Text-Driven Neural Stylization for Meshes⭐code🏠project 3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image⭐code🏠project 27.Pose Estimation(物体姿势估计) OSOP: A Multi-Stage One Shot Object Pose Estimation Framework OnePose: One-Shot Object Pose Estimation without CAD Models⭐code🏠project📰解读 ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo On the Instability of Relative Pose Estimation and RANSAC's Role SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings⭐code🏠project ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes ⭐code🏠project📺video GPV-Pose: Category-Level Object Pose Estimation via Geometry-Guided Point-Wise Voting UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation 4D Revealing Occlusions with 4D Neural Fields😮oral⭐code🏠project Ego4D: Around the World in 3,000 Hours of Egocentric Video⭐code 9D CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild⭐code📰粗解📓 单目目标姿势估计 EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation⭐code 6D RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization⭐code FS6D: Few-Shot 6D Pose Estimation of Novel Objects⭐code🏠project📰解读 Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework⭐code Focal Length and Object Pose Estimation via Render and Compare⭐code🏠project📰解读 DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation⭐code🏠project📰解读 Coupled Iterative Refinement for 6D Multi-Object Pose Estimation⭐code📰解读 ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation⭐code Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation⭐code OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation⭐code SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation⭐code🏠project 3D Object Articulation Understanding 3D Object Articulation in Internet Videos🏠project 3Dope Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions⭐code 26.GCN/GNN GNN 🐦️Lifelong Graph Learning⭐code AEGNN: Asynchronous Event-based Graph Neural Networks⭐code🏠project "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks😮oral⭐code ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching 25.Fine-Grained/Image Classification(细粒度/图像分类) Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification A Voxel Graph CNN for Object Classification with Event Cameras Multi-Modal Extreme Classification⭐code 细粒度分类 Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information⭐code📰粗解📓粗解 Fine-Grained Object Classification via Self-Supervised Pose Alignment⭐code 图像分类 Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification⭐code DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification⭐code Contrastive Test-Time Adaptation🏠project A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes VisCUIT: Visual Auditor for Bias in CNN Image Classifier📺video Multi-Label Iterated Learning for Image Classification With Label Ambiguity⭐code Efficient Classification of Very Large Images With Tiny Objects Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification⭐code Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes⭐code 小样本分类 CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification Matching Feature Sets for Few-Shot Image Classification⭐code🏠project📺video Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification😮oral⭐code🏠project📰解读 Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification📰解读 Generating Representative Samples for Few-Shot Classification⭐code📰粗解在小样本分类问题中，通过生成更多代表性样本，去除非代表性样本，改善了分类结果。实现了SOTA的结果。 Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations Task Discrepancy Maximization for Fine-Grained Few-Shot Classification 小样本分类与分割(FS-CS) Integrative Few-Shot Learning for Classification and Segmentation⭐code 长尾识别 Nested Collaborative Learning for Long-Tailed Visual Recognition⭐code Long-Tailed Recognition via Weight Balancing⭐code Targeted Supervised Contrastive Learning for Long-Tailed Recognition Long-Tail Recognition via Compositional Knowledge Transfer RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition⭐code Trustworthy Long-Tailed Classification⭐code Balanced Contrastive Learning for Long-Tailed Visual Recognition The Majority Can Help the Minority: Context-Rich Minority Oversampling for Long-Tailed Classification⭐code Retrieval Augmented Classification for Long-Tail Visual Recognition Long-Tailed Visual Recognition via Gaussian Clouded Logit Adjustment⭐code 细粒度识别 Knowledge Mining with Scene Text for Fine-Grained Recognition⭐code📰解读多标签分类 Large Loss Matters in Weakly Supervised Multi-Label Classification⭐code🏠project Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss⭐code 类不平衡分类 A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty 图像-文本多模态分类 Expanding Large Pre-Trained Unimodal Models With Multimodal Information Injection for Image-Text Multimodal Classification 24.Super-Resolution(超分辨率) Learning Graph Regularisation for Guided Super-Resolution⭐code Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites⭐code🏠project📰解读 Deep Constrained Least Squares for Blind Image Super-Resolution⭐code📰解读 Discrete Cosine Transform Network for Guided Depth Map Super-Resolution😮oral⭐code Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution⭐code LAR-SR: A Local Autoregressive Model for Image Super-Resolution VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution⭐code Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel⭐code Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution⭐code SphereSR: 360deg Image Super-Resolution With Arbitrary Projection via Continuous Spherical Image Representation Reflash Dropout in Image Super-Resolution GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors⭐code Learning the Degradation Distribution for Blind Image Super-Resolution⭐code Texture-Based Error Analysis for Image Super-Resolution A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution⭐code Task Decoupled Framework for Reference-Based Super-Resolution MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution⭐code VSR Stable Long-Term Recurrent Video Super-Resolution Reference-based Video Super-Resolution Using Multi-Camera Video Triplets⭐code Learning Trajectory-Aware Transformer for Video Super-Resolution😮oral⭐code Investigating Tradeoffs in Real-World Video Super-Resolution⭐code📰解读 BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment⭐code🏠project📺video🏆NTIRE 2021年视频修复和增强挑战赛冠军 Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling📰ETDM：基于显式时间差分建模的视频超分辨率 Memory-Augmented Non-Local Attention for Video Super-Resolution⭐code📰解读 Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning⭐code📰解读 RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution⭐code 23.Image Retrieval(图像检索) Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval⭐code Correlation Verification for Image Retrieval😮oral⭐code Sketch3T: Test-Time Training for Zero-Shot SBIR Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image⭐code Forward Compatible Training for Large-Scale Embedding Retrieval Systems⭐code Contextual Similarity Distillation for Asymmetric Image Retrieval Object-Aware Video-Language Pre-Training for Retrieval⭐code Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features 视频检索 Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval⭐code 文本-视频检索 X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval🏠project📰X-Pool：多伦多大学提出基于文本的视频聚合方式，在视频文本检索上达到SOTA性能！ Bridging Video-text Retrieval with Multiple Choice Questions⭐code📰《BridgeFormer》港大&腾讯&伯克利提出带有多项选择任务的视频文本检索模型，性能SOTA！跨模太检索 ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval Cross Modal Retrieval With Querybank Normalisation⭐code🏠project EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval 手语视频检索 Sign Language Video Retrieval With Free-Form Textual Queries🏠project 22.Image Synthesis/Generation(图像合成) Interactive Image Synthesis with Panoptic Layout Generation⭐code Autoregressive Image Generation using Residual Quantization⭐code📰粗解 GIRAFFE HD: A High-Resolution 3D-aware Generative Model Arbitrary-Scale Image Synthesis⭐code📰粗解 Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis⭐code📰解读 Neural Texture Extraction and Distribution for Controllable Person Image Synthesis⭐code Unpaired Cartoon Image Synthesis via Gated Cycle Mapping 3D Scene Painting via Semantic Image Synthesis 3D-Aware Image Synthesis via Learning Structural and Textural Representations⭐code🏠project📺video High-Resolution Image Synthesis With Latent Diffusion Models⭐code Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis⭐code DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis⭐code Cluster-Guided Image Synthesis With Unconditional Models Day-to-Night Image Synthesis for Training Nighttime Neural ISPs😮oral⭐code Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis⭐code Modulated Contrast for Versatile Image Synthesis⭐code 文本引导的图像处理 ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation😮oral🏠project 姿势引导的图像合成 Exploring Dual-task Correlation for Pose Guided Person Image Generation⭐code📰粗解文本到图像合成 StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis Text-to-Image Synthesis based on Object-Guided Joint-Decoding Transformer📰解读 LAFITE: Towards Language-Free Training for Text-to-Image Generation⭐code DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis😮oral⭐code Text to Image Generation With Semantic-Spatial Aware GAN⭐code Vector Quantized Diffusion Model for Text-to-Image Synthesis⭐code 图像翻译 FlexIT: Towards Flexible Semantic Image Translation⭐code A Style-aware Discriminator for Controllable Image Translation 图像生成 Marginal Contrastive Correspondence for Guided Image Generation😮oral OSSGAN: Open-Set Semi-Supervised Image Generation⭐code A Closer Look at Few-shot Image Generation Modeling Image Composition for Complex Scene Generation⭐code📰解读 Local Attention Pyramid for Scene Image Generation GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation🏠project MaskGIT: Masked Generative Image Transformer Attribute Group Editing for Reliable Few-Shot Image Generation⭐code Learning to Memorize Feature Hallucination for One-Shot Image Generation📰解读 StyleSwin: Transformer-Based GAN for High-Resolution Image Generation⭐code Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation 图像到本文 ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic⭐code 文本-形状生成 CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation⭐code 图像-视频生成 Make It Move: Controllable Image-to-Video Generation With Text Descriptions⭐code 基于文本的目标生成 Zero-Shot Text-Guided Object Generation With Dream Fields⭐code🏠project 人物图像生成 Self-supervised Correlation Mining Network for Person Image Generation 图像-文本匹配 Negative-Aware Attention Framework for Image-Text Matching⭐code 图像和文本之间的双向生成 L-Verse: Bidirectional Generation Between Image and Text⭐code 21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像) CVNet: Contour Vibration Network for Building Extraction⭐code CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data🏠project Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks⭐code 遥感图像融合 HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening⭐code📰粗解航空图像分割 Revisiting Near/Remote Sensing with Geospatial Attention 航空影像检测 Oriented RepPoints for Aerial Object Detection⭐code 卫星影像 PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images⭐code 20.Autonomous vehicles(自动驾驶) 自动驾驶 Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data⭐code Exploiting Temporal Relations on Radar Perception for Autonomous Driving COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles⭐code🏠project📰解读 Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior🏠project Learning From All Vehicles⭐code Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving Unifying Panoptic Segmentation for Autonomous Driving Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving⭐code On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles⭐code 车道线检测 Rethinking Efficient Lane Detection via Curve Modeling⭐code📰粗解📓 Towards Driving-Oriented Metric for Lane Detection Models A Keypoint-based Global Association Network for Lane Detection⭐code📰解读单目3D车道检测 ONCE-3DLanes: Building Monocular 3D Lane Detection⭐code车道线检测技术再演进车道线描述 Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes⭐code CLRNet: Cross Layer Refinement Network for Lane Detection⭐code📰解读自动驾驶场景重新照明 SIMBAR: Single Image-Based Scene Relighting For Effective Data Augmentation For Automated Driving Vision Tasks🏠project 行人轨迹预测 Graph-based Spatial Transformer with Memory Replay for Multi-future Pedestrian Trajectory Prediction⭐code📰解读 ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework Human Trajectory Prediction with Momentary Observation📰粗解轨迹预测 MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction Remember Intentions: Retrospective-Memory-Based Trajectory Prediction⭐code LTP: Lane-Based Trajectory Prediction for Autonomous Driving Vehicle trajectory prediction works, but not everywhere🏠project End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps⭐code Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction Adaptive Trajectory Prediction via Transferable GNN M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction With Relational Reasoning⭐code Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective⭐code ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning⭐code 车辆检测 Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection 19.Neural Architecture Search(神经架构搜索) 🐦️ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior⭐code Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search⭐code📰解读 GPUNet: Searching the Deployable Convolution Neural Networks for GPUs神经架构搜索，面向GPUs部署的轻量级网络结构搜索（比谷歌EfficientNet-X系列、Meta FBNetV3 速度更快，甚至性能都要好，作者来自英伟达） Distribution Consistent Neural Architecture Search Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule GreedyNASv2: Greedier Search With a Greedy Path Filter Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning⭐code Neural Architecture Search with Representation Mutual Information⭐code Demystifying the Neural Tangent Kernel From a Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training?⭐code b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search⭐code Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search⭐code 18.Person Re-Identification(人员重识别) 组重识别 Modeling 3D Layout for Group Re-Identification⭐code Reid Part-based Pseudo Label Refinement for Unsupervised Person Re-identification⭐code Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification⭐code FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification Large-Scale Pre-training for Person Re-identification with Noisy Labels⭐code Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification⭐code Implicit Sample Extension for Unsupervised Person Re-Identification⭐code📰解读 Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification⭐code NFormer: Robust Person Re-identification with Neighbor Transformer⭐code📰解读 Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification⭐code Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation🏠project Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification Augmented Geometric Distillation for Data-Free Incremental Person ReID⭐code Salient-to-Broad Transition for Video Person Re-Identification⭐code Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification⭐code Meta Distribution Alignment for Generalizable Person Re-Identification⭐code AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification Id-Free Person Similarity Learning 换装行人重识别 Clothes-Changing Person Re-identification with RGB Modality Only⭐code📰解读 Cloth-Changing Person Re-Identification From a Single Image With Gait Prediction and Regularization⭐code 遮挡行人重识别 Feature Erasing and Diffusion Network for Occluded Person Re-Identification 人群计数 Leveraging Self-Supervision for Cross-Domain Crowd Counting⭐code Boosting Crowd Counting via Multifaceted Attention⭐code Bi-level Alignment for Cross-Domain Crowd Counting⭐code📰解读 Crowd Counting in the Frequency Domain 行人检测 STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes⭐code 步态识别 Gait Recognition in the Wild with Dense 3D Representations and A Benchmark⭐code🏠project📰解读 Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition⭐code Person Search PSTR: End-to-End One-Step Person Search With Transformers⭐code Cascade Transformers for End-to-End Person Search⭐code 17.Medical Image(医学影像) Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations😮oral BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation⭐code DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis⭐code📰解读 Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning⭐code🏠project What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements⭐code ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics⭐code 3D生物打印 Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers利用伤口分割和重建生成3D生物打印贴片来治疗糖尿病足溃疡 SR（ＭRI） Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution⭐code 医学图像分割 CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision⭐code C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image⭐code HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet Closing the Generalization Gap of Cross-Silo Federated Medical Image Segmentation⭐code Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation⭐code 医学图像配准 Affine Medical Image Registration with Coarse-to-Fine Vision Transformer⭐code 医学图像分析 FIBA: Frequency-Injection based Backdoor Attack in Medical Image Analysis⭐code📰解读自动生成报告 Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation 医学图像分类 ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification⭐code M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer CT合成 Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis 医学影像关键点检测 Which Images To Label for Few-Shot Medical Landmark Detection? MRI Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks⭐code Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to the Task of Accelerated MRI Reconstruction⭐code 组织病理学 Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images⭐code 牙齿 Improving Segmentation of the Inferior Alveolar Nerve Through Deep Label Propagation🏠project 3D医学分析 Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis⭐code 三维牙齿实例分割 DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations 疟疾检测 Towards Low-Cost and Efficient Malaria Detection🌻dataset 16.Semi/self-supervised learning(半/自监督) 自监督 A study on the distribution of social biases in self-supervised learning visual models⭐code Learning Where to Learn in Cross-View Self-Supervised Learning⭐code Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy⭐code DATA: Domain-Aware and Task-Aware Self-Supervised Learning⭐code Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision⭐code Self-Supervised Spatial Reasoning on Multi-View Line Drawings⭐code🏠project Self-Supervised Models Are Continual Learners⭐code Learning Pixel Trajectories With Multiscale Contrastive Random Walks⭐code🏠project Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning⭐code Backdoor Attacks on Self-Supervised Learning⭐code Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors🏠project Masked Feature Prediction for Self-Supervised Visual Pre-Training⭐code Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning⭐code Patch-Level Representation Learning for Self-Supervised Vision Transformers A Simple Data Mixing Prior for Improving Self-Supervised Learning⭐code Sound and Visual Representation Learning With Multiple Pretraining Tasks Align Representations With Base: A New Approach to Self-Supervised Learning UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework⭐code SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos Exploring Set Similarity for Dense Self-Supervised Representation Learning 无监督 RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes⭐code Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning⭐code PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors⭐code Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning Unsupervised Learning of Debiased Representations With Pseudo-Attributes⭐code 半监督 Class-Aware Contrastive Semi-Supervised Learning⭐code📰解读 RSCFed: Random Sampling Consensus Federated Semi-supervised Learning⭐code FisherMatch: Semi-Supervised Rotation Regression via Entropy-based Filtering😮oral🏠project Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels SimMatch: Semi-Supervised Learning With Similarity Matching⭐code CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning⭐code DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning⭐code🏠project Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos⭐code Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning 弱监督 P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision⭐code使用教学视频进行概率性程序规划的弱监督方法 Revisiting Weakly Supervised Pre-Training of Visual Perception Models⭐code Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis⭐code Decoupling Makes Weakly Supervised Local Feature Better⭐code 15.Transformer Vision Transformer With Deformable Attention⭐code Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts⭐code HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space⭐code BoxeR: Box-Attention for 2D and 3D Transformers⭐code Video Swin Transformer⭐code APRIL: Finding the Achilles' Heel on Privacy for Vision Transformers Fast Point Transformer⭐code ChiTransformer:Towards Reliable Stereo from Cues Beyond Fixation: Dynamic Window Visual Transformer⭐code Training-free Transformer Architecture Search📰解读 Automated Progressive Learning for Efficient Training of Vision Transformers⭐code Collaborative Transformers for Grounded Situation Recognition⭐code TubeDETR: Spatio-Temporal Video Grounding with Transformers😮oral⭐code🏠project Deformable Video Transformer MixFormer: Mixing Features across Windows and Dimensions😮oral⭐code📰粗解 Are Multimodal Transformers Robust to Missing Modality? MiniViT: Compressing Vision Transformers with Weight Multiplexing Multimodal Token Fusion for Vision Transformers⭐code Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer😮oral⭐code📰解读 UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog对比学习用于视觉对话的统一Transformer架构 Patch Slimming for Efficient Vision Transformers📰解读 Swin Transformer V2: Scaling Up Capacity and Resolution⭐code📰大大刷新记录！Swin Transformer v2.0 来了，30亿参数！ SimMIM: A Simple Framework for Masked Image Modeling⭐code NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition⭐code📰解读 Mobile-Former: Bridging MobileNet and Transformer⭐code MulT: An End-to-End Multitask Learning Transformer⭐code🏠project Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning😮oral⭐code📰解读 CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens⭐code IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes⭐code Reversible Vision Transformers⭐code MetaFormer Is Actually What You Need for Vision😮oral⭐code GradViT: Gradient Inversion of Vision Transformers🏠project CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows⭐code MViTv2: Improved Multiscale Vision Transformers for Classification and Detection⭐code📰Meta&伯克利基于池化自注意力机制提出通用多尺度视觉Transformer，在ImageNet分类准确率达88.8%！开源 A-ViT: Adaptive Tokens for Efficient Vision Transformer😮oral🏠project📰不重要的token可以提前停止计算！英伟达提出自适应token的高效视觉Transformer网络A-ViT，提高模型的吞吐量！ Certified Patch Robustness via Smoothed Vision Transformers⭐code The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy⭐code Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training⭐code Object-Region Video Transformers⭐code🏠project Shunted Self-Attention via Multi-Scale Token Aggregation😮oral⭐code Towards Robust Vision Transformer⭐code Fine-tuning Image Transformers using Learnable Memory Lite Vision Transformer With Enhanced Self-Attention⭐code Self-Supervised Video Transformer⭐code TransMix: Attend To Mix for Vision Transformers⭐code CMT: Convolutional Neural Networks Meet Vision Transformers⭐code 形状补全 ShapeFormer: Transformer-based Shape Completion via Sparse Representation⭐code🏠project 14.Video Improving Video Model Transfer With Dynamic Representation Learning 动作分割 Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering📺video Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering 动作理解 How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs⭐code Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos⭐code Video Copy Detection(视频拷贝检测) A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection⭐code 视频合成 Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning⭐code Playable Environments: Video Manipulation in Space and Time⭐code🏠project 3D Moments from Near-Duplicate Photos🏠project Neural 3D Video Synthesis From Multi-View Video😮oral🏠project 视频异常检测 Generative Cooperative Learning for Unsupervised Video Anomaly Detection Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection Deep Anomaly Discovery From Unlabeled Videos via Normality Advantage and Self-Paced Refinement UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection⭐code 视频监控轨迹预测 How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion⭐code Non-Probability Sampling Network for Stochastic Human Trajectory Prediction⭐code 视频时刻检索和视频高光检测 UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection⭐code Learning Pixel-Level Distinctions for Video Highlight Detection Contrastive Learning for Unsupervised Video Highlight Detection⭐code 视频时刻检索 AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval 视频预测 STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction Continual Predictive Learning from Videos😮oral⭐code SimVP: Simpler yet Better Video Prediction⭐code📰解读 Comparing Correspondences: Video Prediction With Correspondence-Wise Losses⭐code🏠project 视频个体计数 DR.VIC: Decomposition and Reasoning for Video Individual Counting⭐code 视频插值 Many-to-many Splatting for Efficient Video Frame Interpolation⭐code TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation🏠project Long-term Video Frame Interpolation via Feature Propagation Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion🏠project 视觉对应（视频） Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning⭐code 视频识别 BEVT: BERT Pretraining of Video Transformers⭐code📰视频Transformer自监督预训练新范式，复旦、微软云AI实现视频识别新SOTA MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing⭐code📰解读 MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition📰将模型的记忆保存下来！Meta&UC Berkeley提出MeMViT，建模时间支持比现有模型长30倍，计算量仅增加4.5% Multiview Transformers for Video Recognition⭐code Group Contextualization for Video Recognition⭐code AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition⭐code 视频分类零样本视频分类 Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification⭐code 视频动作分类 Learning To Recognize Procedural Activities With Distant Supervision⭐code 视频预测 Modular Action Concept Grounding in Semantic Video Prediction🏠project 手部动作预测 Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos🏠project📺video 视频分割 Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation⭐code VSS Scene Consistency Representation Learning for Video Scene Segmentation⭐code📰解读1📰解读2 VOS Recurrent Dynamic Embedding for Video Object Segmentation⭐code Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation⭐code🏠project📰北航&信工所&美团提出LBDT，基于语言桥接的时空交互来进行准确指向性视频对象分割，性能SOTA！代码开源（CVPR 2022） Accelerating Video Object Segmentation With Compressed Video⭐code📰CoVOS：无需解码！利用压缩视频比特流的运动矢量和残差进行半监督的VOS加速（CVPR 2022） End-to-End Referring Video Object Segmentation With Multimodal Transformers⭐code HODOR: High-Level Object Descriptors for Object Re-Segmentation in Video Learned From Static Images⭐code SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization Language As Queries for Referring Video Object Segmentation⭐code Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross-Modal Denoising Networks⭐code YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset⭐code🏠project Per-Clip Video Object Segmentation 视频实例分割(VIS) Efficient Video Instance Segmentation via Tracklet Query and Proposal🏠project📺video📰粗解 Temporally Efficient Vision Transformer for Video Instance Segmentation😮oral⭐code📰解读 VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation⭐code Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation 视频语义分割 Coarse-to-Fine Feature Mining for Video Semantic Segmentation⭐code 视频全景分割 Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation😮oral⭐code📰解读 Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation⭐code Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark⭐code 视频影像处理视频恢复 Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature⭐code Neural Compression-Based Feature Learning for Video Restoration 视频修复 Towards An End-to-End Framework for Flow-Guided Video Inpainting⭐code The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting⭐code Revisiting Temporal Alignment for Video Restoration⭐code DLFormer: Discrete Latent Transformer for Video Inpainting⭐code Inertia-Guided Flow Completion and Style Fusion for Video Inpainting⭐code 视频去摩尔纹 Video Demoireing with Relation-Based Temporal Consistency🏠project📺video 视频去模糊 Multi-Scale Memory-Based Video Deblurring⭐code 视频去噪 Dancing under the stars: video denoising in starlight⭐code 电影修复 Bringing Old Films Back to Life⭐code 视频表征学习 TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition😮oral⭐code📰解读 Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging⭐code Motion-Adjustable Neural Implicit Video Representation 自监督视频表征学习 Hierarchical Self-supervised Representation Learning for Movie Understanding⭐code🏠project Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency⭐code🏠project Cross-Architecture Self-supervised Video Representation Learning⭐code📰解读📰不同网络结构的特征也能进行对比学习？蚂蚁&美团&南大&阿里提出跨架构自监督视频表示学习方法CACL，性能SOTA！视频对比学习 Probabilistic Representations for Video Contrastive Learning 视频分解 Deformable Sprites for Unsupervised Video Decomposition😮oral🏠project 视频阴影检测 Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training⭐code 视频帧插值 IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation⭐code📰解读 Video Frame Interpolation with Transformer⭐code📰解读 Video Frame Interpolation Transformer⭐code Optimizing Video Prediction via Video Frame Interpolation ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation⭐code 视频理解 Revisiting the "Video" in Video-Language Understanding😮oral⭐code Long-Short Temporal Contrastive Learning of Video Transformers 通用事件边界检测(视频理解) UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection⭐code End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection 视频字幕 End-to-End Generative Pretraining for Multimodal Video Captioning📰谷歌多模态预训练框架：视频字幕、动作分类、问答全部实现SOTA Hierarchical Modular Network for Video Captioning⭐code SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning⭐code EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching⭐code 视频重构 E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations Context-Aware Video Reconstruction for Rolling Shutter Cameras⭐code📰解读视频相似度评估 Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation🏠project 视频摘要 Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer🏠project IntentVizor: Towards Generic Query Guided Interactive Video Summarization⭐code 视频编解码 OCSampler: Compressing Videos to One Clip With Single-Step Sampling Learning Based Multi-Modality Image and Video Compression Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction LSVC: A Learning-Based Stereo Video Compression Framework 视频建模 Stand-Alone Inter-Frame Attention in Video Models⭐code📰解读视频段落定位 Semi-Supervised Video Paragraph Grounding With Contrastive Encoder 句子定位 Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning⭐code 序列验证 SVIP: Sequence VerIfication for Procedures in Videos🏠project 视频编辑 M3L: Language-Based Video Editing via Multi-Modal Multi-Level Transformers 视频视觉关系检测 VRDFormer: End-to-End Video Visual Relation Detection With Transformers⭐code 视频动作推理 Complex Video Action Reasoning via Learnable Markov Logic Network 视频重建 Event-based Video Reconstruction via Potential-assisted Spiking Neural Network⭐code🏠project 13.GAN 🐦️HyperInverter: Improving StyleGAN Inversion via Hypernetwork🏠project InsetGAN for Full-Body Image Generation🏠project📰1024x1024 分辨率，效果惊人！InsetGAN：全身图像生成 Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data⭐code Deep Image-based Illumination Harmonization GAN-Supervised Dense Visual Alignment😮oral⭐code🏠project📺video📰CVPR2022 Oral：GAN监督的密集视觉对齐，代码开源 Styleformer: Transformer Based Generative Adversarial Networks With Style Vector⭐code📰解读 HairMapper: Removing Hair from Portraits Using GANs⭐code Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps😮oral🏠project Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models On Aliased Resizing and Surprising Subtleties in GAN Evaluation⭐code🏠project Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment Depth-Aware Generative Adversarial Network for Talking Head Video Generation⭐code Efficient Geometry-Aware 3D Generative Adversarial Networks⭐code🏠project DO-GAN: A Double Oracle Framework for Generative Adversarial Networks GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation⭐code CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs⭐code🏠project📺video HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing⭐code🏠project Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing Improving GAN Equilibrium by Raising Spatial Awareness⭐code🏠project SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis Pix2NeRF: Unsupervised Conditional p-GAN for Single Image to Neural Radiance Fields Translation⭐code Think Twice Before Detecting GAN-Generated Fake Images From Their Spectral Domain Imprints Ensembling Off-the-Shelf Models for GAN Training😮oral⭐code🏠project Style Transformer for Image Inversion and Editing⭐code BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations⭐code🏠project High-Fidelity GAN Inversion for Image Attribute Editing⭐code🏠project Manifold Learning Benefits GANs⭐code BodyGAN: General-Purpose Controllable Neural Human Body Generation Feature Statistics Mixing Regularization for Generative Adversarial Networks⭐code StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2⭐code🏠project SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing⭐code🏠project LARGE: Latent-Based Regression Through GAN Semantics⭐code 图像篡改检测 Proactive Image Manipulation Detection⭐code 头发编辑 HairCLIP: Design Your Hair by Text and Reference Image⭐code 12.Image-to-Image Translation(图像到图像翻译) Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation⭐code InstaFormer: Instance-Aware Image-to-Image Translation with Transformer Unsupervised Image-to-Image Translation with Generative Prior⭐code🏠project📺video Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint⭐code📰解读 Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation⭐code Self-Supervised Dense Consistency Regularization for Image-to-Image Translation 11.Face(人脸) Synthetic Generation of Face Videos With Plethysmograph Physiology🏠project Protecting Celebrities with Identity Consistency Transformer PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer⭐code How Much Does Input Data Type Impact Final Face Model Accuracy? HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion⭐code🏠project Estimating Structural Disparities for Face Models General Facial Representation Learning in a Visual-Linguistic Manner😮oral⭐code Deepfake Voice-Face Homogeneity Tells Deepfake⭐code📰粗解妆容迁移 Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer⭐code 人脸识别 Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation Killing Two Birds with One Stone:Efficient and Robust Training of Face Recognition CNNs by Partial FC⭐code AdaFace: Quality Adaptive Margin for Face Recognition😮oral⭐code Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC⭐code Learning To Learn Across Diverse Data Biases in Deep Face Recognition Simulated Adversarial Testing of Face Recognition Models Privacy-Preserving Online AutoML for Domain-Specific Face Detection An Efficient Training Approach for Very Large Scale Face Recognition⭐code 人脸表情识别 Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin⭐code Multi-Dimensional, Nuanced and Subjective - Measuring the Perception of Facial Expressions Face2Exp: Combating Data Biases for Facial Expression Recognition⭐code Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in "In-the-Wild" Videos😮oral⭐code🏠project 三维人像 RigNeRF: Fully Controllable Neural 3D Portraits 3D人脸 ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations Learning to Restore 3D Face from In-the-Wild Degraded Images📰解读活体检测 PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch Recognition Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing⭐code 假脸检测 Exploring Frequency Adversarial Attacks for Face Forgery Detection📰粗解 Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection End-to-End Reconstruction-Classification Learning for Face Forgery Detection📰解读 Learning Second Order Local Anomaly for General Face Forgery Detection Protecting Celebrities From DeepFake With Identity Consistency Transformer⭐code 人脸交换 High-resolution Face Swapping via Latent Semantics Disentanglement⭐code Region-Aware Face Swapping Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness 人脸属性分类 Fair Contrastive Learning for Facial Attribute Classification⭐code Face Relighting(人脸重照光) Face Relighting with Geometrically Consistent Shadows⭐code 人脸编辑 TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing⭐code🏠project FENeRF: Face Editing in Neural Radiance Fields⭐code🏠project 人脸幻构 Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination Deepfake检测 Detecting Deepfakes with Self-Blended Images😮oral⭐code DeepFake Disrupter: The Detector of DeepFake Is My Friend 人脸重建 JIFF: Jointly-aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction⭐code🏠project📰解读人脸三维重建 Generating Diverse 3D Reconstructions From a Single Occluded Face Image⭐code 人脸捕捉 EMOCA: Emotion Driven Monocular Face Capture and Animation🏠project 换头 Few-Shot Head Swapping in the Wild😮oral⭐code🏠project📺video📰解读人像畸变矫正 Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer⭐code📰解读 3D人脸建模 Physically-guided Disentangled Implicit Rendering for 3D Face Modeling📰解读人脸修复 Blind Face Restoration via Integrating Face Shape and Generative Priors⭐code📰解读 Rethinking Deep Face Restoration RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs⭐code Learning to Restore 3D Face from In-the-Wild Degraded Images 人脸对齐 Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning⭐code Occlusion-Robust Face Alignment Using a Viewpoint-Invariant Hierarchical Network Architecture⭐code 语音驱动的3D脸部动画 FaceFormer: Speech-Driven 3D Facial Animation with Transformers⭐code🏠project 舌头三维重建 3D Human Tongue Reconstruction From Single "In-the-Wild" Images⭐code 伪造图像检测 Robust Image Forgery Detection Over Online Social Network Shared Images⭐code 人脸解析 Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing⭐code 人脸表情 Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality 人脸检测 MogFace: Towards a Deeper Appreciation on Face Detection⭐code 人脸重现 Dual-Generator Face Reenactment⭐code 说话人脸生成 Talking Face Generation With Multilingual TTS🏠project Expressive Talking Head Generation With Granular Audio-Visual Control 人脸关键点 Towards Accurate Facial Landmark Detection via Cascaded Transformers 人脸变形 FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From a Hybrid Dataset⭐code 3D人脸表情合成 Sparse to Dense Dynamic 3D Facial Expression Generation⭐code 语音驱动的动画舌头 Speech Driven Tongue Animation⭐code🏠project 文本-人脸 AnyFace: Free-Style Text-To-Face Synthesis and Manipulation 面部动作单元识别 Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition 人脸验证 DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover's Distance Improves Out-of-Distribution Face Identification⭐code 10.3D(三维视觉) Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows⭐code 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection😮oral⭐code📰解读 Physical Simulation Layer for Accurate 3D Modeling φ-SfT: Shape-from-Template with a Physics-Based Deformation Model🏠project ICON: Implicit Clothed Humans Obtained From Normals⭐code🏠project Representing 3D Shapes With Probabilistic Directed Distance Fields Improving Neural Implicit Surfaces Geometry With Patch Warping⭐code LOLNerf: Learn From One Look🏠project Neural Mesh Simplification Extracting Triangular 3D Models, Materials, and Lighting From Images😮oral⭐code🏠project PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos⭐code🏠project The Wanderings of Odysseus in 3D Scenes⭐code🏠project Volumetric Bundle Adjustment for Online Photorealistic Scene Capture Stereo Merging PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation GraftNet: Towards Domain Generalized Stereo Matching with a Broad-Spectrum and Task-Oriented Feature⭐code Degradation-agnostic Correspondence from Resolution-asymmetric Stereo Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation😮oral⭐code📰解读 stereo matching Chitransformer: Towards Reliable Stereo From Cues⭐code Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching FoggyStereo: Stereo Matching With Fog Volume Representation ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks⭐code 深度估计 OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion😮oral⭐code NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation⭐code🏠project 🐦️Toward Practical Self-Supervised Monocular Indoor Depth Estimation P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior⭐code HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model Multi-Frame Self-Supervised Depth with Transformers Layered Depth Refinement with Mask Guidance🏠project 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation⭐code🏠project Towards Multimodal Depth Estimation from Light Fields Multi-Frame Self-Supervised Depth with Transformers Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation⭐code Multi-View Depth Estimation by Fusing Single-View Depth Probability With Multi-View Geometry😮oral⭐code Toward Practical Monocular Indoor Depth Estimation Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data Stereo Depth From Events Cameras: Concentrate and Focus on the Future⭐code Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light⭐code CroMo: Cross-Modal Learning for Monocular Depth Estimation Deep Depth From Focus With Differential Focus Volume Gated2Gated: Self-Supervised Depth Estimation From Gated Images⭐code 房间布局 LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network⭐code📰粗解 MVS RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers⭐code📰解读 Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo⭐code IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo⭐code Generalized Binary Search Network for Highly-Efficient Multi-View Stereo⭐code Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering⭐code🏠project Efficient Multi-View Stereo by Iterative Dynamic Cost Volume⭐code MVS2D: Efficient Multi-View Stereo via Attention-Driven 2D Convolutions⭐code MVPS Uncertainty-Aware Deep Multi-View Photometric Stereo 三维重建 PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo Self-supervised Neural Articulated Shape and Appearance Models🏠project BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion Topologically-Aware Deformation Fields for Single-View 3D Reconstruction⭐code🏠project Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction⭐code🏠project📰解读 What's in your hands? 3D Reconstruction of Generic Objects in Hands⭐code🏠project📺video📰解读 Surface Reconstruction from Point Clouds by Learning Predictive Context Priors⭐code FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction⭐code📰解读 KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos SPAMs: Structured Implicit Parametric Models🏠project📺video Enhancing Face Recognition With Self-Supervised 3D Reconstruction Neural Fields As Learnable Kernels for 3D Reconstruction🏠project Input-Level Inductive Biases for 3D Reconstruction Human-Aware Object Placement for Visual Environment Reconstruction⭐code🏠project Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction⭐code OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction⭐code🏠project 三维场景重建 Neural 3D Scene Reconstruction with the Manhattan-world Assumption😮oral⭐code🏠project📺video📰解读 StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions⭐code🏠project📺video PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes⭐code Look Outside the Room: Synthesizing a Consistent Long-Term 3D Scene Video From a Single Image⭐code🏠project NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction⭐code🏠project 手物重建 Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution 三维服装网格重建 Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images🏠project Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing🏠project 三维网格重建 Neural Template: Topology-aware Reconstruction and Disentangled Generation of 3D Meshes⭐code📰解读三维形状重建 3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow⭐code GIFS: Neural Implicit Function for General Shape Representation🏠project 三维服装变形 SNUG: Self-Supervised Neural Dynamic Garments😮oral⭐code 纹理迁移与合成 AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis⭐code🏠project📺video 形状匹配 A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching⭐code Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching⭐code 表面重建 Critical Regularizations for Neural Surface Reconstruction in the Wild POCO: Point Convolution for Surface Reconstruction⭐code Neural RGB-D Surface Reconstruction 多视图网格重建 Multi-View Mesh Reconstruction With Neural Deferred Shading 3D形状分析 Medial Spectral Coordinates for 3D Shape Analysis Learning Deep Implicit Functions for 3D Shapes with Dynamic Code Clouds⭐code 三维补全 AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation⭐code🏠project 图像重建 Image Based Reconstruction of Liquids from 2D Surface Detections⭐code PS Fast Light-Weight Near-Field Photometric Stereo 预测三维物体形状 Learning 3D Object Shape and Layout Without 3D Supervision🏠project 三维形状 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces⭐code 神经三维内容生成 StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation🏠project 深度补全 RGB-Depth Fusion GAN for Indoor Depth Completion GuideFormer: Transformers for Image Guided Depth Completion Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion 线段重建 ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance 形状重建 Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow⭐code 3D形状生成 Towards Implicit Text-Guided 3D Shape Generation⭐code 3D狗的形状 BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information🏠project 3D Part Segmentation AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation 3D语义场景完成 MonoScene: Monocular 3D Semantic Scene Completion⭐code 9.Human Pose Estimation(人体姿态估计) COAP: Compositional Articulated Occupancy of People⭐code🏠project📺video📰解读 Context-Aware Sequence Alignment using 4D Skeletal Augmentation😮oral⭐code🏠project Generalizable Human Pose Triangulation Location-Free Human Pose Estimation📰解读 Meta Agent Teaming Active Learning for Pose Estimation Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation⭐code 多人姿态估计 Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation⭐code End-to-End Multi-Person Pose Estimation With Transformers⭐code Contextual Instance Decoupling for Robust Multi-Person Pose Estimation⭐code 基于视频的HPE Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation😮oral⭐code 3D pose MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video⭐code PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision😮oral⭐code Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation🏠project Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation📰精准高效估计多人3D姿态，美图&北航分布感知式单阶段模型 Forecasting Characteristic 3D Poses of Human Actions📺video Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization⭐code Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision🏠project ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses⭐code MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation⭐code PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video⭐code🏠project GraFormer: Graph-Oriented Transformer for 3D Pose Estimation AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion Generation MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision⭐code🏠project Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation 4D 人体捕获 H4D: Human 4D Modeling by Learning Neural Compositional Representation⭐code🏠project 运动捕捉 Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture🏠project A Low-Cost & Real-Time Motion Capture System📺video LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds 手臂-手部动态估计 Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation 3D人体形状 OSSO: Obtaining Skeletal Shape from Outside⭐code🏠project📺video📰解读 Dense correspondence BodyMap: Learning Full-Body Dense Correspondence Map🏠project 3D人体运动重建 Differentiable Dynamics for Articulated 3d Human Motion Reconstruction 三维人体姿态重建 Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video Putting People in their Place: Monocular Regression of 3D People in Depth⭐code📰解读人体网格恢复 Human Mesh Recovery From Multiple Shots⭐code🏠project Occluded Human Mesh Recovery🏠project GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras😮oral⭐code🏠project 人体运动描述 Programmatic Concept Learning for Human Motion Description and Synthesis🏠project 三维人体动作 Generating Diverse and Natural 3D Human Motions From Text⭐code🏠project 三维人体合成 Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis⭐code🏠project HSC Capturing and Inferring Dense Full-Body Human-Scene Contact⭐code🏠project📺video 3D人体运动合成 Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis 人体重建 DoubleField: Bridging the Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering🏠project SMPL-A: Modeling Person-Specific Deformable Anatomy SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video😮oral⭐code 手部姿态手部网格重建 MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image⭐code 3D手部姿势 Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation⭐code 音频驱动的手势重演 Audio-driven Neural Gesture Reenactment with Video Motion Graphs⭐code 3D手重建 LISA: Learning Implicit Shape and Appearance of Hands🏠project 手部跟踪 Whose Hands Are These? Hand Detection and Hand-Body Association in the Wild Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in the Wild 手势生成 Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation⭐code🏠project 3D手网格估计 HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network⭐code 三维人体 Accurate 3D Body Shape Regression Using Metric and Semantic Attributes 8.Action Detection(人体动作检测与识别) 动作检测 Colar: Effective and Efficient Online Action Detection by Consulting Exemplars⭐code Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos End-to-End Semi-Supervised Learning for Video Action Detection SPAct: Self-supervised Privacy Preservation for Action Recognition⭐code Temporal Alignment Networks for Long-term Video😮oral⭐code🏠project📰粗解 SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition GateHUB: Gated History Unit With Background Suppression for Online Action Detection MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection⭐code📰MS-TCT：Inria&SBU提出用于动作检测的多尺度时间Transformer，效果SOTA！已开源！（CVPR2022） Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos🏠project Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition Learning From Temporal Gradient for Semi-Supervised Action Recognition⭐code DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition⭐code Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition Object-Relation Reasoning Graph for Action Recognition Revisiting Skeleton-Based Action Recognition😮oral⭐code InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition⭐code End-to-End Semi-Supervised Learning for Video Action Detection⭐code Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models😮oral TubeR: Tubelet Transformer for Video Action Detection😮oral🏠project 半监督动作识别 Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition🏠project 零样本动作识别 Cross-modal Representation Learning for Zero-shot Action Recognition⭐code零样本动作识别：跨模态表示学习小样本动作识别 Hybrid Relation Guided Set Matching for Few-shot Action Recognition⭐code📰解读 Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition Spatio-Temporal Relation Modeling for Few-Shot Action Recognition⭐code 时序动作检测 An Empirical Study of End-to-End Temporal Action Detection⭐code📰粗解 RCL: Recurrent Continuous Localization for Temporal Action Detection 时序动作定位 Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation⭐code📰粗解 Unsupervised Pre-training for Temporal Action Localization Tasks⭐code ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization⭐code Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization⭐code Structured Attention Composition for Temporal Action Localization⭐code Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization OpenTAL: Towards Open Set Temporal Action Localization⭐code 重复动作计数 TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting😮oral⭐code🏠project 组动作识别 Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition😮oral Detector-Free Weakly Supervised Group Activity Recognition⭐code🏠project 动作质量评估 FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment😮oral⭐code🏠project📰解读活动识别 Audio-Adaptive Activity Recognition Across Video Domains⭐code🏠project 群体活动识别 Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition🏠project 7.Point Cloud(点云) Shape-invariant 3D Adversarial Point Clouds⭐code AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception⭐code REGTR: End-to-end Point Cloud Correspondences with Transformers⭐code Equivariant Point Cloud Analysis via Learning Orientations for Message Passing⭐code Text2Pos: Text-to-Point-Cloud Cross-Modal Localization⭐code🏠project Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds⭐code Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation⭐code📰解读 3DeformRS: Certifying Spatial Deformations on Point Clouds⭐code Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors⭐code📰解读 Density-preserving Deep Point Cloud Compression⭐code🏠project📰解读 Surface Representation for Point Clouds😮oral⭐code📰解读1📰解读2 Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling⭐code Point Cloud Pre-Training With Natural 3D Structures⭐code🏠project Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds⭐code Point2Cyl: Reverse Engineering 3D Objects from Point Clouds to Extrusion Cylinders RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior PatchFormer: An Efficient Point Transformer With Patch Attention PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images Point Cloud Color Constancy⭐code Multimodal Colored Point Cloud to Image Alignment No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models by Fitting Feature-Level Space-Time Surfaces⭐code Domain Adaptation on Point Clouds via Geometry-Aware Implicits⭐code ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds 3DAC: Learning Attribute Compression for Point Clouds RCP: Recurrent Closest Point for Point Cloud⭐code Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds⭐code🏠project The Devil Is in the Pose: Ambiguity-Free 3D Rotation-Invariant Learning via Pose-Aware Convolution 3D 点云 Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling⭐code CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding⭐code📰粗解CrossPoint，一个用于 3D 点云表征学习的简单自监督学习框架。虽然该方法是在合成的三维物体数据集上训练的，但在下游任务中的实验结果，如三维物体分类和三维物体部分分割，在合成和真实世界的数据集中都证明了该方法在学习可迁移表征方面的有效性。 IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment⭐code A Unified Query-based Paradigm for Point Cloud Understanding⭐code WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation⭐code 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients🏠project Why Discard if You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis Upright-Net: Learning Upright Orientation for 3D Point Cloud 3D点云分割 Stratified Transformer for 3D Point Cloud Segmentation⭐code 点云分类 ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation⭐code📰粗解📓 点云配准 SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration⭐code📰二阶相似性测度，让传统配准方法取得比深度学习更好的性能，并达到深度学习的速度 Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering⭐code Deterministic Point Cloud Registration via Novel Transformation Decomposition📰解读 SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration⭐code Geometric Transformer for Fast and Robust Point Cloud Registration⭐code 点云补全 Learning a Structured Latent Space for Unsupervised Point Cloud Completion Learning Local Displacements for Point Cloud Completion LAKe-Net: Topology-Aware Point Cloud Completionby Localizing Aligned Keypoints📰粗解 LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned Keypoints 点云分割 Contrastive Boundary Learning for Point Cloud Segmentation⭐code📰解读 SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation⭐code📰解读 An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation⭐code Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation⭐code 点云匹配 Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes⭐code 场景流估计 RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds 点云理解 PointCLIP: Point Cloud Understanding by CLIP⭐code 6.Object Tracking(目标跟踪) TCTrack: Temporal Contexts for Aerial Tracking⭐code📰粗解📰TCTrack: 用于空中跟踪的时序信息框架 Correlation-Aware Deep Tracking Global Tracking Transformers⭐code Unified Transformer Tracker for Object Tracking⭐code Global Tracking via Ensemble of Local Trackers⭐code Unsupervised Learning of Accurate Siamese Tracking⭐code Transformer Tracking with Cyclic Shifting Window Attention⭐codeTransformer 跟踪：循环为一窗口注意力模型。该算法在五个数据集VOT2020, UAV123, LaSOT, TrackingNet, GOT-10k上均实现了新的SOTA. Tracking People by Predicting 3D Appearance, Location and Pose😮oral⭐code🏠project Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos⭐code Opening Up Open World Tracking😮oral⭐code🏠project Transforming Model Prediction for Tracking⭐code PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments⭐code Spiking Transformers for Event-Based Single Object Tracking⭐code Correlation-Aware Deep Tracking MixFormer: End-to-End Tracking With Iterative Mixed Attention😮oral⭐code PTTR: Relational 3D Point Cloud Object Tracking With Transformer⭐code GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking⭐code 3D 目标跟踪 Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds⭐code📰粗解 Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects⭐code BCOT: A Markerless High-Precision 3D Object Tracking Benchmark⭐code 多目标跟踪 Learning of Global Objective for Network Flow in Multi-Object Tracking MeMOT: Multi-Object Tracking with Memory😮oral Multi-Object Tracking Meets Moving UAV Adiabatic Quantum Computing for Multi Object Tracking Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking⭐code TrackFormer: Multi-Object Tracking With Transformers⭐code DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion⭐code RGB-T跟踪 Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline🏠project📰解读视觉跟踪 Ranking-Based Siamese Visual Tracking⭐code📰解读夜间跟踪 Unsupervised Domain Adaptation for Nighttime Aerial Tracking⭐code 人类运动跟踪 Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors⭐code🏠project 多人姿态跟踪 PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking⭐code 5.Object Detection(目标检测) DN-DETR: Accelerate DETR Training by Introducing Query DeNoising⭐code📰粗解 Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation⭐code ESCNet: Gaze Target Detection with the Understanding of 3D Scenes⭐code Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection⭐code Interactron: Embodied Adaptive Object Detection⭐code Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection以往目标检测往往以目标包围框作为标注训练，作者引入语言提示信息，提炼语言知识到目标检测模型中，获得了1.6~2.1%的性能增益。 Dynamic Sparse R-CNN Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild⭐code📰粗解 Focal and Global Knowledge Distillation for Detectors⭐code📰解读关于目标检测的知识蒸馏工作，只需要30行代码就可以在 anchor-base, anchor-free 的单阶段、两阶段各种检测器上稳定涨点，现在代码已经开源。 Group R-CNN for Weakly Semi-supervised Object Detection with Points⭐code📰解读 Real-time Object Detection for Streaming Perception⭐code📰解读 Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model⭐code Optimal Correction Cost for Object Detection Evaluation Expanding Low-Density Latent Regions for Open-Set Object Detection⭐code📰解读 SIOD: Single Instance Annotated Per Category Per Image for Object Detection⭐code📰解读 Task-specific Inconsistency Alignment for Domain Adaptive Object Detection⭐code Zero-Query Transfer Attacks on Context-Aware Object Detectors AdaMixer: A Fast-Converging Query-Based Object Detector😮oral⭐code Learning to Detect Mobile Objects from LiDAR Scans Without Labels⭐code Forecasting from LiDAR via Future Object Detection⭐code Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection😮oral⭐code Multi-Granularity Alignment Domain Adaptation for Object Detection⭐code Proper Reuse of Image Classification Features Improves Object Detection⭐code R(Det)^2: Randomized Decision Routing for Object Detection Towards Robust Adaptive Object Detection under Noisy Annotations⭐code Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection Interactive Segmentation and Visualization for Tiny Objects in Multi-megapixel Images⭐code Cross Domain Object Detection by Target-Perceived Dual Branch Distillation⭐code跨域目标检测：目标感知双分支蒸馏 Progressive End-to-End Object Detection in Crowded Scenes⭐code📰解读 HCSC: Hierarchical Contrastive Selective Coding⭐code📰CNN自监督预训练新SOTA：上交、Mila、字节联合提出具有层级结构的图像表征自学习新框架 Recurrent Glimpse-based Decoder for Detection with Transformer😮oral⭐code📰解读 Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism⭐code Balanced and Hierarchical Relation Learning for One-Shot Object Detection⭐code Accelerating DETR Convergence via Semantic-Aligned Matching⭐code DETReg: Unsupervised Pretraining With Region Priors for Object Detection⭐code🏠project Source-Free Object Detection by Learning To Overlook Domain Style DESTR: Object Detection With Split Transformer SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline⭐code Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network Not All Labels Are Equal: Rationalizing the Labeling Costs for Training Object Detection⭐code Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer Sequential Voting With Relational Box Fields for Active Object Detection⭐code🏠project Simple Multi-dataset Detection⭐code ObjectFormer for Image Manipulation Detection and Localization A Dual Weighting Label Assignment Scheme for Object Detection⭐code Point-Level Region Contrast for Object Detection Pre-Training😮oral Neural Volumetric Object Selection🏠project Confidence Propagation Cluster: Unleash Full Potential of Object Detectors Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation⭐code DetectorDetective: Investigating the Effects of Adversarial Examples on Object Detectors📺video Cross-Domain Adaptive Teacher for Object Detection⭐code🏠project End-to-End Human-Gaze-Target Detection With Transformers 小目标检测 QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection⭐code Interactive Multi-Class Tiny-Object Detection⭐code ISNet: Shape Matters for Infrared Small Target Detection⭐code📰解读零样本目标检测 Robust Region Feature Synthesizer for Zero-Shot Object Detection⭐code 小样本目标检测 Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection Few-Shot Object Detection with Fully Cross-Transformer Kernelized Few-Shot Object Detection With Efficient Integral Aggregation⭐code Label, Verify, Correct: A Simple Few Shot Object Detection Method⭐code🏠project 目标定位 Weakly Supervised Object Localization as Domain Adaption⭐code📰粗解 Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization Object Localization under Single Coarse Point Supervision⭐code📰解读 CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping⭐code Spatial Commonsense Graph for Object Localisation in Partial Scenes🏠project⭐code🏠project 3D目标检测 Point Density-Aware Voxels for LiDAR 3D Object Detection⭐code A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds⭐code Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving⭐code📰粗解 Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task🏠project Point2Seq: Detecting 3D Objects as Sequences⭐code MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection⭐code Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes⭐code📰粗解 Exploring Geometric Consistency for Monocular 3D Object Detection LiDAR Snowfall Simulation for Robust 3D Object Detection😮oral⭐code CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection Homography Loss for Monocular 3D Object Detection HyperDet3D: Learning a Scene-conditioned 3D Object Detector DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection⭐code OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data⭐code Focal Sparse Convolutional Networks for 3D Object Detection😮oral⭐code📰解读📓 Rotationally Equivariant 3D Object Detection🏠project Bridged Transformer for Vision and Point Cloud 3D Object Detection📰解读 Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion😮oral⭐code📰解读 VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention⭐code📰华南理工提出VISTA：双跨视角空间注意力机制实现3D目标检测SOTA，即插即用 Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection😮oral MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer⭐code Voxel Field Fusion for 3D Object Detection⭐code📰解读 DisARM: Displacement Aware Relation Module for 3D Detection⭐code Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement⭐code Embracing Single Stride 3D Object Detector With Sparse Transformer⭐code 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection🏠project Dimension Embeddings for Monocular 3D Object Detection MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection⭐code RBGNet: Ray-Based Grouping for 3D Object Detection⭐code LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection⭐code MonoGround: Detecting Monocular 3D Objects From the Ground⭐code TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers⭐code Boosting 3D Object Detection by Simulating Multimodality on Point Clouds 伪装目标检测 Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection⭐code Detecting Camouflaged Object in Frequency Domain Implicit Motion Handling for Video Camouflaged Object Detection🏠project Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way⭐code 全监督目标检测 Omni-DETR: Omni-Supervised Object Detection with Transformers⭐code 自监督目标检测 Self-Supervised Object Detection From Audio-Visual Correspondence 半监督目标检测 Dense Learning based Semi-Supervised Object Detection⭐code📰解读 Label Matching Semi-Supervised Object Detection⭐code Semi-Supervised Object Detection via Multi-Instance Alignment With Global Class Prototypes Active Teacher for Semi-Supervised Object Detection⭐code Scale-Equivalent Distillation for Semi-Supervised Object Detection Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection⭐code 弱监督目标检测 Salvage of Supervision in Weakly Supervised Object Detection Background Activation Suppression for Weakly Supervised Object Localization⭐code H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection⭐code 显著目标检测 Pyramid Grafting Network for One-Stage High Resolution Saliency Detection⭐code📰解读📰超高分辨率显著目标检测，新颖高效的错层嫁接架构PGNet（CVPR2022） Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection⭐code📰解读 Bi-directional Object-context Prioritization Learning for Saliency Ranking⭐code Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection⭐code 密集目标检测 Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection⭐code Co-Salient目标检测 Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection⭐code Can You Spot the Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection⭐code 长尾目标检测 C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection Equalized Focal Loss for Dense Long-Tailed Object Detection⭐code Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection 旋转目标检测 OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection 关键点检测 Self-Supervised Equivariant Learning for Oriented Keypoint Detection⭐code🏠project UKPGAN: A General Self-Supervised Keypoint Detector⭐code📰粗解 Contour-Hugging Heatmaps for Landmark Detection⭐code Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species 关键点发现 Self-Supervised Keypoint Discovery in Behavioral Videos⭐code🏠project object discovery Discovering Objects that Can Move Affordance grounding Learning Affordance Grounding from Exocentric Images⭐code📰解读 Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut⭐code🏠project 图像对齐 Unsupervised Homography Estimation with Coplanarity-Aware GAN⭐code📰解读物体属性识别 Disentangling Visual Embeddings for Attributes and Objects😮oral⭐code 消影点检测 Deep vanishing point detection: Geometric priors make dataset variations vanish⭐code 红外探测线 Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World😮oral OOD Deep Hybrid Models for Out-of-Distribution Detection Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization🌻dataset PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures⭐code The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization Out-of-Distribution Generalization With Causal Invariant Transformations ViM: Out-Of-Distribution with Virtual-logit Matching⭐code OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization⭐code Neural Mean Discrepancy for Efficient Out-of-Distribution Detection 开放世界目标检测 OW-DETR: Open-world Detection Transformer⭐code 域适应目标检测 SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection⭐code 密集目标检测 Localization Distillation for Dense Object Detection⭐code 图像复制检测 A Self-Supervised Descriptor for Image Copy Detection⭐code 变化检测 Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches⭐code 图像识别 AdaViT: Adaptive Vision Transformers for Efficient Image Recognition⭐code 4.Image Captioning(图像字幕) X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning⭐code Quantifying Societal Bias Amplification in Image Captioning NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection⭐code🏠project Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning DIFNet: Boosting Visual Information Flow for Image Captioning⭐code📰解读 VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning⭐code Comprehending and Ordering Semantics for Image Captioning⭐code📰解读 DeeCap: Dynamic Early Exiting for Efficient Image Captioning⭐code Show, Deconfound and Tell: Image Captioning With Causal Inference⭐code Scaling Up Vision-Language Pre-Training for Image Captioning🌻dataset NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models⭐code Injecting Semantic Concepts Into End-to-End Image Captioning Novel Object Captioning NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge 3.Image Progress(图像处理) 图像恢复 Attentive Fine-Grained Structured Sparsity for Image Restoration⭐code📰解读 Uformer: A General U-Shaped Transformer for Image Restoration⭐code Burst Image Restoration and Enhancement😮oral⭐code BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras Restormer: Efficient Transformer for High-Resolution Image Restoration😮oral⭐code TransWeather: Transformer-Based Restoration of Images Degraded by Adverse Weather Conditions⭐code Deep Generalized Unfolding Networks for Image Restoration⭐code Self-Supervised Deep Image Restoration via Adaptive Stochastic Gradient Langevin Dynamics⭐code All-in-One Image Restoration for Unknown Corruption⭐code Exploring and Evaluating Image Restoration Potential in Dynamic Scenes⭐code KNN Local Attention for Image Restoration 图像修复 Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding⭐code📰粗解 MAT: Mask-Aware Transformer for Large Hole Image Inpainting⭐code Reduce Information Loss in Transformers for Pluralistic Image Inpainting⭐code UniCoRN: A Unified Conditional Image Repainting Network Dual-Path Image Inpainting With Auxiliary GAN Inversion MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting⭐code 图像拼接 Deep Rectangling for Image Stitching: A Learning Baseline😮oral⭐code📰粗解 utomatic Color Image Stitching Using Quaternion Rank-1 Alignment Geometric Structure Preserving Warp for Natural Image Stitching⭐code 运动去模糊 Unifying Motion Deblurring and Frame Interpolation with Events⭐code image outpainting Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation🏠project 图像美学评估 Personalized Image Aesthetics Assessment with Rich Attributes🏠project 图像质量评估 Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment⭐code📰解读图像去雨 Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond⭐code Dreaming To Prune Image Deraining Networks 图像去模糊 Learning to Deblur using Light Field Generated and Real Defocus Images⭐code🏠project Pixel Screening Based Intermediate Correction for Blind Deblurring Deblurring via Stochastic Refinement XYDeblur: Divide and Conquer for Single Image Deblurring Towards Multi-Domain Single Image Dehazing via Test-Time Training 图像压缩 SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention⭐code Global Sensing and Measurements Reuse for Image Compressed Sensing⭐code DPICT: Deep Progressive Image Compression Using Trit-Planes😮oral⭐code Joint Global and Local Hierarchical Priors for Learned Image Compression⭐code Neural Data-Dependent Transform for Learned Image Compression⭐code🏠project LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network⭐code ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding😮oral Deep Stereo Image Compression via Bi-Directional Coding Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression⭐code The Devil Is in the Details: Window-Based Attention for Image Compression⭐code 图像无损压缩 PILC: Practical Image Lossless Compression With an End-to-End GPU Oriented Neural Framework 图像去噪 CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image⭐code NAN: Noise-Aware NeRFs for Burst-Denoising Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots⭐code AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network⭐code RePaint: Inpainting Using Denoising Diffusion Probabilistic Models⭐code Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching 图像去雾 Image Dehazing Transformer with Transmission-Aware 3D Position Embedding🏠project De-rendering Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata⭐code📰解读 De-Rendering 3D Objects in the Wild⭐code IDR: Self-Supervised Image Denoising via Iterative Data Refinement⭐code RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising⭐code Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition⭐code📰解读📰D4：非成对图像去雾，基于密度与深度分解的自增强方法（CVPR 2022）图像增强 Toward Fast, Flexible, and Robust Low-Light Image Enhancement😮oral⭐code📰解读📰SCI：快速、灵活与稳健的低光照图像增强方法（CVPR 2022 Oral） AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-time Image Enhancement⭐code Directional Self-supervised Learning for Heavy Image Augmentations⭐code📰解读 Abandoning the Bayer-Filter To See in the Dark⭐code URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement⭐code GIQE: Generic Image Quality Enhancement via Nth Order Iterative Degradation Deep Color Consistent Network for Low-Light Image Enhancement SNR-Aware Low-Light Image Enhancement⭐code 图像和谐化 SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization⭐code High-Resolution Image Harmonization via Collaborative Dual Transformations⭐code 图像超级补全 Scene Graph Expansion for Semantics-Guided Image Outpainting该文解决了一个非常有意思的问题，通过对图像场景图的扩展，对图像边缘以外的内容进行语义引导的内容生成，可帮助设计师快速绘就自然和谐的图像扩展内容。语义图像匹配 TransforMatcher: Match-to-Match Attention for Semantic Correspondence⭐code🏠project📰解读图像修饰 ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo⭐code 图像着色 Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization 图像校正 EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction⭐code 图像分解 PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition⭐code🏠project 图像重建 Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction⭐code A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift⭐code 图像配准 A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration⭐code NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment⭐code 图像编辑 Brain-Supervised Image Editing 图像缩放 Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence 图像色彩编辑 SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing⭐code🏠project 图像拼图 SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage⭐code 图像裁剪 Rethinking Image Cropping: Exploring Diverse Compositions From Global Views 图像补全 Bridging Global Context Interactions for High-Fidelity Image Completion⭐code 基于文本指导的图像操作 DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation⭐code Image Dewarping Revisiting Document Image Dewarping by Grid Regularization 恶劣天气消除 Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model⭐code Image Outpainting InOut: Diverse Image Outpainting via GAN Inversion⭐code🏠project 消除阴影 Bijective Mapping Network for Shadow Removal 图像隐写术 Robust Invertible Image Steganography 声音引导的语义图像处理 Sound-Guided Semantic Image Manipulation⭐code🏠project 用于文本驱动的自然图像编辑 Blended Diffusion for Text-driven Editing of Natural Images⭐code🏠project 伪影去除 Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography 2.Image Segmentation(图像分割) FocalClick: Towards Practical Interactive Image Segmentation⭐code📰粗解 Multimodal Material Segmentation Semantic-Aware Domain Generalized Segmentation😮oral⭐code ReSTR: Convolution-free Referring Image Segmentation Using Transformers⭐code🏠project CRIS: CLIP-Driven Referring Image Segmentation Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation🏠project全景神经场：谷歌新提出的语义级目标感知的神经场景表示模型。该表示模型可以有效地用于新视图合成、2D 全景分割、3D 场景编辑和多视图深度预测等多项任务。相信这又会是一个引领潮流的新方向。 FocusCut: Diving Into a Focus View in Interactive Segmentation🏠project Hyperbolic Image Segmentation⭐code Clustering Plotted Data by Image Segmentation⭐code Generalizable Cross-Modality Medical Image Segmentation via Style Augmentation and Dual Normalization⭐code Image Segmentation Using Text and Image Prompts⭐code📰CLIP还能做分割任务？哥廷根大学提出一个使用文本和图像prompt，能同时作三个分割任务的模型CLIPSeg，榨干CLIP能力 ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation⭐code📰解读 Adaptive Early-Learning Correction for Segmentation From Noisy Annotations⭐code Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation Masked-Attention Mask Transformer for Universal Image Segmentation⭐code🏠project📰能同时做三个分割任务的模型，性能和效率优于MaskFormer！Meta&UIUC提出通用分割模型，性能优于任务特定模型！开源！ High Quality Segmentation for Ultra High-Resolution Images⭐code LAVT: Language-Aware Vision Transformer for Referring Image Segmentation📰性能超群！牛津&上海AI Lab&港大&商汤&清华强强联手，提出用于引用图像分割的语言感知视觉Transformer！代码已开源实例分割 E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation⭐code📰粗解 Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling⭐code Sparse Instance Activation for Real-Time Instance Segmentation⭐code SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation🏠project Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity⭐code🏠project DArch: Dental Arch Prior-assisted 3D Tooth Instance Segmentation Relieving Long-tailed Instance Segmentation via Pairwise Class Balance⭐code📰解读 ContrastMask: Contrastive Learning to Segment Every Thing📰解读基于像素级对比学习的不完全监督实例分割算法 GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation⭐code TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation⭐code Pointly-Supervised Instance Segmentation😮oral⭐code🏠project Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers⭐code Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement⭐code Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings⭐code Mask Transfiner for High-Quality Instance Segmentation⭐code 半监督实例分割 Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?⭐code 3D 实例分割 SoftGroup for 3D Instance Segmentation on Point Clouds⭐code📰粗解 🐦️FreeSOLO: Learning to Segment Objects without Annotations⭐code 小样本分割 iFS-RCNN: An Incremental Few-Shot Instance Segmenter 语义分割 Generalized Few-Shot Semantic Segmentation⭐code Scribble-Supervised LiDAR Semantic Segmentation😮oral⭐code Novel Class Discovery in Semantic Segmentation⭐code🏠project Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation⭐code Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction⭐code Pin the Memory: Learning to Generalize Semantic Segmentation⭐code📰解读 Representation Compensation Networks for Continual Semantic Segmentation⭐code Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation⭐code📰解读 GroupViT: Semantic Segmentation Emerges from Text Supervision⭐code🏠project📺video📰做语义分割不用任何像素标签，UCSD、英伟达在ViT中加入分组模块 Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation⭐code📰粗解 Deep Hierarchical Semantic Segmentation⭐code Semantic Segmentation by Early Region Proxy⭐code📰粗解 SimT: Handling Open-set Noise for Domain Adaptive Semantic Segmentation⭐code Rethinking Semantic Segmentation: A Prototype View😮oral⭐code On the Road to Online Adaptation for Semantic Image Segmentation⭐code Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds⭐code NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night⭐code📰解读 TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation⭐code Cross-Image Relational Knowledge Distillation for Semantic Segmentation⭐code📰解读 Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers⭐code Self-Supervised Learning of Object Parts for Semantic Segmentation⭐code Cross-view Transformers for real-time Map-view Semantic Segmentation😮oral⭐code Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization🏠project Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation⭐code📰解读 Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device Partial Class Activation Attention for Semantic Segmentation⭐code Incremental Learning in Semantic Segmentation From Image Labels⭐code HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization📰解读 ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation Domain-Agnostic Prior for Transfer Semantic Segmentation Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation Sparse and Complete Latent Organization for Geospatial Semantic Segmentation 3D语义分割 MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation🏠project Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation😮oral⭐code📰解读 Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation 弱监督语义分割 Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation⭐code📰粗解 Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation⭐code Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation⭐code Cross Language Image Matching for Weakly Supervised Semantic Segmentation⭐code Multi-class Token Transformer for Weakly Supervised Semantic Segmentation⭐code Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers⭐code📰解读 Weakly Supervised Semantic Segmentation using Out-of-Distribution Data⭐code📰粗解 L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation⭐code Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation⭐code Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation⭐code C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation⭐code Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation⭐code 无监督语义分割 Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation⭐code 半监督语义分割 Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels⭐code🏠project Semi-supervised Semantic Segmentation with Error Localization Network⭐code🏠project📰粗解 UCC: Uncertainty guided Cross-head Co-training for Semi-Supervised Semantic Segmentation Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation⭐code Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation⭐code ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation⭐code 域适应语义分割 Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation⭐code ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation⭐code DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation⭐code 域泛化语义分割 WildNet: Learning Domain Generalized Semantic Segmentation from the Wild⭐code 零样本语义分割 Decoupling Zero-Shot Semantic Segmentation⭐code 小样本语义分割 Learning Non-target Knowledge for Few-shot Semantic Segmentation⭐code📰解读 Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer 跨域语义分割 Undoing the Damage of Label Shift for Cross-Domain Semantic Segmentation⭐code 动作分割 Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos Fast and Unsupervised Action Boundary Detection for Action Segmentation 场景解析 FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing⭐code Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing⭐code 雾景分割 FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation😮oral⭐code🏠project 全景分割 Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation Joint Forecasting of Panoptic Segmentations with Difference Attention⭐code📰解读 PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation⭐code📰解读 Amodal Panoptic Segmentation🏠project Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo Heatmap CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers⭐code 抠图 Human Instance Matting via Mutual Guidance and Multi-Instance Refinement😮oral⭐code MatteFormer: Transformer-Based Image Matting via Prior-Tokens⭐code 玻璃分割 Glass Segmentation Using Intensity and Spectral Polarization Cues🏠project Amodal Segmentation Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model⭐code 场景理解 Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding ScanQA: 3D Question Answering for Spatial Scene Understanding⭐code Egocentric Scene Understanding via Multimodal Spatial Rectifier⭐code 人体解析 CDGNet: Class Distribution Guided Network for Human Parsing⭐code Part Segmentation Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles🏠project 小样本分割 GanOrCon: Are Generative Models Useful for Few-Shot Segmentation?⭐code🏠project Learning What Not To Segment: A New Perspective on Few-Shot Segmentation😮oral⭐code 3D分割 INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation⭐code 零件分割 PartGlot: Learning Shape Part Segmentation From Language Reference Games😮oral⭐code 1.其它 Learning to Anticipate Future with Dynamic Context Removal⭐code📰粗解 Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks Instance-wise Occlusion and Depth Orders in Natural Scenes⭐code IFOR: Iterative Flow Minimization for Robotic Object Rearrangement🏠project PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence⭐code🏠project📺video📰粗解 LiT: Zero-Shot Transfer with Locked-image text Tuning CAFE: Learning to Condense Dataset by Aligning Features⭐code📰粗解 BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning⭐code📰粗解📓 ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching⭐code📰粗解 Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values⭐code Do Explanations Explain? Model Knows Best⭐code HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging⭐code E-CIR: Event-Enhanced Continuous Intensity Recovery⭐code 🐦️Transferability Estimation using Bhattacharyya Class Separability Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks⭐code GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction⭐code Differentially Private Federated Learning with Local Regularization and Sparsification Towards Efficient and Scalable Sharpness-Aware Minimization⭐code DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences⭐code📰粗解 Dynamic Dual-Output Diffusion Models Moving Window Regression: A Novel Approach to Ordinal Regression Egocentric Prediction of Action Target in 3D Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning⭐code Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction⭐code Neural Reflectance for Shape Recovery with Shadow Handling⭐code DyRep: Bootstrapping Training with Dynamic Re-parameterization⭐code Enhancing Classifier Conservativeness and Robustness by Polynomiality Versatile Multi-Modal Pre-Training for Human-Centric Perception⭐code Attributable Visual Similarity Learning⭐code Optimizing Elimination Templates by Greedy Parameter Search Partially Does It: Towards Scene-Level FG-SBIR with Partial Input Bi-level Doubly Variational Learning for Energy-based Latent Variable Models Brain-inspired Multilayer Perceptron with Spiking Neurons ARCS: Accurate Rotation and Correspondence Search⭐code iPLAN: Interactive and Procedural Layout Planning HINT: Hierarchical Neuron Concept Explainer⭐code Visual Abductive Reasoning⭐code A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration⭐code Learning Structured Gaussians to Approximate Deep Ensembles Self-Supervised Image Representation Learning with Geometric Set Consistency Balanced Multimodal Learning via On-the-fly Gradient Modulation😮oral⭐code CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters⭐code Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation😮oral Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian Long-term Visual Map Sparsification with Heterogeneous GNN Clean Implicit 3D Structure from Noisy 2D STEM Images Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets⭐code CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism⭐code🏠project Fast Light-Weight Near-Field Photometric Stereo Fast, Accurate and Memory-Efficient Partial Permutation Synchronization Multi-Robot Active Mapping via Neural Bipartite Graph Matching Learning Program Representations for Food Images and Cooking Recipes😮oral⭐code🏠project Iterative Deep Homography Estimation⭐code Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain Generating High Fidelity Data from Low-density Regions using Diffusion Models Continuous Scene Representations for Embodied AI⭐code🏠project It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps Reflection and Rotation Symmetry Detection via Equivariant Learning⭐code🏠project Exploiting Explainable Metrics for Augmented SGD⭐code On the Importance of Asymmetry for Siamese Representation Learning⭐code Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression Perception Prioritized Training of Diffusion Models⭐code LASER: LAtent SpacE Rendering for 2D Visual Localization😮oral Efficient Maximal Coding Rate Reduction by Variational Forms Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network Progressive Minimal Path Method with Embedded CNN Online Convolutional Re-parameterization⭐code Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes⭐code Leveraging Equivariant Features for Absolute Pose Regression Neural Convolutional Surfaces🏠project GLASS: Geometric Latent Augmentation for Shape Spaces⭐code🏠project Total Variation Optimization Layers for Computer Vision Identifying Ambiguous Similarity Conditions via Semantic Matching⭐code📰解读 TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates Gravitationally Lensed Black Hole Emission Tomography⭐code🏠project📺video Robust and Accurate Superquadric Recovery: a Probabilistic Approach⭐code Projective Manifold Gradient Layer for Deep Rotation Regression⭐code Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale⭐code Single-Photon Structured Light Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention😮oral⭐code Defensive Patches for Robust Recognition in the Physical World⭐code📰解读 Event-aided Direct Sparse Odometry😮oral⭐code🏠project📺video Deep Unlearning via Randomized Conditionally Independent Hessians⭐code Learning to Imagine: Diversify Memory for Incremental Learning using Unlabeled Data Towards Data-Free Model Stealing in a Hard Label Setting⭐code🏠project Proto2Proto: Can you recognize the car, the way I do?⭐code Balanced MSE for Imbalanced Visual Regression😮oral⭐code📰CVPR 2022 (Oral) | 回归标签不平衡? 试试Balanced MSE Leveraging Unlabeled Data for Sketch-based Understanding Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction⭐code🏠project Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs⭐code📰解读 RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality⭐code📰解读 An Image Patch is a Wave: Quantum Inspired Vision MLP😮oral⭐code A ConvNet for the 2020s⭐code NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations头发建模：仅用一幅图像，构建高保真度的头发模型，使用隐式神经表示的方法。出自浙大CAD&CG组、ETH Zurich、香港城市大学。 A Unified Framework for Implicit Sinkhorn Differentiation⭐code📰解读 Towards Better Understanding Attribution Methods⭐code Universal Photometric Stereo Network using Global Lighting Contexts⭐code🏠project📺video📰解读 Estimating Example Difficulty Using Variance of Gradients One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching Holocurtains: Programming Light Curtains via Binary Holography Do Learned Representations Respect Causal Relationships? CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly Mixed Differential Privacy in Computer Vision Which Model To Transfer? Finding the Needle in the Growing Haystack Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss⭐code RAGO: Recurrent Graph Optimizer For Multiple Rotation Averaging⭐code Virtual Elastic Objects🏠project Bayesian Invariant Risk Minimization⭐code Shape From Polarization for Complex Scenes in the Wild⭐code Non-Iterative Recovery from Nonlinear Observations using Generative Models Moving Window Regression: A Novel Approach to Ordinal Regression⭐code Generative Flows With Invertible Attentions Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers The Flag Median and FlagIRLS Implicit Feature Decoupling With Depthwise Quantization⭐code UNIST: Unpaired Neural Implicit Shape Translation Network⭐code🏠project Mutual Information-Driven Pan-Sharpening A Framework for Learning Ante-Hoc Explainable Models via Concepts SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision⭐code Convolutions for Spatial Interaction Modeling FastDOG: Fast Discrete Optimization on GPU⭐code Convolution of Convolution: Let Kernels Spatially Collaborate⭐code Generalized Category Discovery⭐code🏠project Maximum Consensus by Weighted Influences of Monotone Boolean Functions Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery⭐code Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space Less Is More: Generating Grounded Navigation Instructions From Landmarks HEAT: Holistic Edge Attention Transformer for Structured Reconstruction⭐code🏠project Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization⭐code How Well Do Sparse ImageNet Models Transfer?⭐code REX: Reasoning-Aware and Grounded Explanation⭐code Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration Hire-MLP: Vision MLP via Hierarchical Rearrangement⭐code One-Bit Active Query With Contrastive Pairs Sparse Non-Local CRF Dataset Distillation by Matching Training Trajectories⭐code🏠project Deep Decomposition for Stochastic Normal-Abnormal Transport😮oral Parametric Scattering Networks⭐code ScaleNet: A Shallow Architecture for Scale Estimation⭐code Learning To Solve Hard Minimal Problems Learning Canonical F-Correlation Projection for Compact Multiview Representation CellTypeGraph: A New Geometric Computer Vision Benchmark⭐code RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data⭐code🏠project Neural Prior for Trajectory Estimation ActiveZero: Mixed Domain Learning for Active Stereovision with Zero Annotation⭐code Global-Aware Registration of Less-Overlap RGB-D Scans Efficient Deep Embedded Subspace Clustering Rep-Net: Efficient On-Device Learning via Feature Reprogramming⭐code WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery FLAVA: A Foundational Language and Vision Alignment Model⭐code🏠project Scanline Homographies for Rolling-Shutter Plane Absolute Pose⭐code Exemplar-based Pattern Synthesis with Implicit Periodic Field Network Understanding Uncertainty Maps in Vision With Statistical Testing⭐code B-Cos Networks: Alignment Is All We Need for Interpretability Learning to Collaborate in Decentralized Learning of Personalized Models📰解读 360-Attack: Distortion-Aware Perturbations From Perspective-Views A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors A Hybrid Quantum-Classical Algorithm for Robust Fitting⭐code Topology Preserving Local Road Network Estimation From Single Onboard Camera Image⭐code RendNet: Unified 2D/3D Recognizer With Latent Space Rendering Towards Real-World Navigation With Deep Differentiable Planners⭐code An Iterative Quantum Approach for Transformation Estimation From Point Sets UnweaveNet: Unweaving Activity Stories⭐code Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations⭐code Learning Video Representations of Human Motion From Synthetic Data TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing⭐code The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions⭐code🏠project Simple but Effective: CLIP Embeddings for Embodied AI⭐code Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations⭐code Recall@k Surrogate Loss With Large Batches and Similarity Mixup⭐code Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport⭐code Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design HeadNeRF: A Real-Time NeRF-Based Parametric Head Model⭐code Replacing Labeled Real-Image Datasets With Auto-Generated Contours⭐code🏠project Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees Omnivore: A Single Model for Many Visual Modalities🏠project Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources⭐code🏠project Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening⭐code HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture🏠project Deep Image-based Illumination Harmonization⭐code Ditto: Building Digital Twins of Articulated Objects From Interaction😮oral⭐code🏠project TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed Masked Autoencoders Are Scalable Vision Learners Neural Inertial Localization⭐code🏠project Neural Recognition of Dashed Curves With Gestalt Law of Continuity BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation🏠project Merry Go Round: Rotate a Frame and Fool a DNN Modeling sRGB Camera Noise With Normalizing Flows🏠project Co-Advise: Cross Inductive Bias Distillation⭐code Automatic Relation-Aware Graph Network Proliferation⭐code Stereo Magnification With Multi-Layer Images⭐code🏠project CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data Rethinking Controllable Variational Autoencoders BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster HARA: A Hierarchical Approach for Robust Rotation Averaging⭐code Diffusion Autoencoders: Toward a Meaningful and Decodable Representation😮oral⭐code🏠project Learning Fair Classifiers with Partially Annotated Group Labels⭐code Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction High-Fidelity Human Avatars From a Single RGB Camera⭐code🏠project RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry How Good Is Aesthetic Ability of a Fashion Model?⭐code Learning With Neighbor Consistency for Noisy Labels GeoEngine: A Platform for Production-Ready Geospatial Research Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction On the Integration of Self-Attention and Convolution⭐code Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector⭐code MAXIM: Multi-Axis MLP for Image Processing😮oral⭐code Delving Into the Estimation Shift of Batch Normalization in a Network⭐code Learning Object Context for Novel-View Scene Layout Generation Dist-PU: Positive-Unlabeled Learning From a Label Distribution Perspective Relative Pose From a Calibrated and an Uncalibrated Smartphone Image The Devil Is in the Margin: Margin-Based Label Smoothing for Network Calibration⭐code The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference⭐code AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels⭐code Parameter-Free Online Test-Time Adaptation😮oral⭐code AlignMixup: Improving Representations by Interpolating Aligned Features⭐code HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging⭐code Brain-inspired Multilayer Perceptron with Spiking Neurons SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs Training Quantised Neural Networks with STE Variants: the Additive Noise Annealing Algorithm⭐code Split Hierarchical Variational Compression Privacy Preserving Partial Localization Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent From the Decision Boundary Perspective⭐code Frame Averaging for Equivariant Shape Space Learning Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation⭐code Co-domain Symmetry for Complex-Valued Deep Learning DeepCurrents: Learning Implicit Representations of Shapes With Boundaries Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture⭐code Cycle-Consistent Counterfactuals by Latent Transformations FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks Local Texture Estimator for Implicit Representation Function⭐code Degree-of-Linear-Polarization-Based Color Constancy Learning To Learn by Jointly Optimizing Neural Architecture and Weights Discrete Time Convolution for Fast Event-Based Stereo SelfD: Self-Learning Large-Scale Driving Policies From the Web Autofocus for Event Cameras⭐code🏠project Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies PNP: Robust Learning from Noisy Labels by Probabilistic Noise Prediction Revisiting the Transferability of Supervised Pretraining: An MLP Perspective PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions⭐code Contrastive Conditional Neural Processes Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video😮oral⭐code🏠project Scenic: A JAX Library for Computer Vision Research and Beyond⭐code Calibrating Deep Neural Networks by Pairwise Constraints Deep Saliency Prior for Reducing Visual Distraction🏠project Efficient Large-Scale Localization by Global Instance Recognition VisualHow: Multimodal Problem Solving⭐code Learning To Generate Line Drawings That Convey Geometry and Semantics⭐code🏠project📺video On Guiding Visual Attention With Language Specification⭐code Learning To Align Sequential Actions in the Wild⭐code A Sampling-Based Approach for Efficient Clustering in Large Datasets⭐code AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks Pooling Revisited: Your Receptive Field Is Suboptimal Learning to Find Good Models in RANSAC⭐code Image Disentanglement Autoencoder for Steganography Without Embedding⭐code Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models Globetrotter: Connecting Languages by Connecting Images😮oral⭐code🏠project Symmetry-Aware Neural Architecture for Embodied Visual Exploration Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos⭐code Stereoscopic Universal Perturbations Across Different Architectures and Datasets⭐code Learned Queries for Efficient Local Attention😮oral⭐code Structure-Aware Flow Generation for Human Body Reshaping⭐code A Structured Dictionary Perspective on Implicit Neural Representations The Implicit Values of a Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement😮oral⭐code🏠project How Much More Data Do I Need? Estimating Requirements for Downstream Tasks GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision Enabling Equivariance for Arbitrary Lie Groups⭐code Robust Fine-Tuning of Zero-Shot Models SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images⭐code🏠project Compressing Models With Few Samples: Mimicking Then Replacing⭐code Weakly but Deeply Supervised Occlusion-Reasoned Parametric Road Layouts Exposure Normalization and Compensation for Multiple-Exposure Correction⭐code Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching⭐code Optimal LED Spectral Multiplexing for NIR2RGB Translation⭐code Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects⭐code🏠project Transferability Metrics for Selecting Source Model Ensembles Adversarial Parametric Pose Prior⭐code RAMA: A Rapid Multicut Algorithm on GPU⭐code RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks Complex Backdoor Detection by Symmetric Feature Differencing Bilateral Video Magnification Filter Disentangling Visual and Written Concepts in CLIP Image Animation With Perturbed Masks⭐code Hyperspherical Consistency Regularization 扫码CV君微信（注明：CVPR）入微信交流群：

9475fa20fd5e95235d9fa23ae9587a2

【本文地址】

CVPR

CVPR

今日新闻

推荐新闻